
Edited by 



Sixth Edition 



Burger’S 



uona a j. Aoranam 



VOLUME 1 

Drug Discovery 





Rl JRr^FR'S 

MEDICINAL CHEMISTRY 

AND 

DRUG DISCOVERY 

Sixth Edition 

Voiume 1: Drug Discovery 



Edited by 



Donald J. Abraham 

Department of Medicinal Chemistry 
School of Pharmacy 

Vir ' 





20c 



60113 



Burger's Medicinal Chemistry and Drug Discovery 
is available Online in full color at 
www.mrw.interscience.wiley.com/bmcdd. 




WILEY- 

INTERSCIENCE 

A JohnWiley and Sons, Inc., Publication 




BURGm MEMORIAL EDITION 



The Sixth Edition of Burger's Medicinal 
Chemistry and Drug Discovery is being desig- 
nated as a Memorial Edition. Professor Alfr ed 
Burger was born in Vienna, Austria on Sep- 
tember 6, 1905 and died on December 30, 
2000. Dr. Burger received his Ph.D. from the 
University of Vienna in 1928 and joined the 
Dmg Addiction Eaboratory in the Department 
cf Chemistry at the University of Virginia in 
1929. During his early years at UVA, he syn- 
thesized fragments of the morphine molecule 
in an attempt to find the analgesic pharma- 
cophore. He joined the UVA chemistry faculty 
in 1938 and served the department until his 
retirement in 1970. The chemistry depart- 
ment at UVA became the major academic 
training ground for medicinal chemists be- 
cause of Professor Burger. 

Dr. Burger's research focused on analge- 
sics, antidepressants, and chemotherapeutic 
agents. He is one of the few academicians to 
have a drug, designed and synthesized in his 



laboratories, brought to market [Parnate, 
which is the brand name for tranylcypromine, 
a monoamine oxidase (MAO) inhibitor]. Dr. 
Burger was a visiting Professor at the Univer- 
sity of Hawaii and lectured throughout the 
world. He founded the Journal of Medicinal 
Chemistry, Medicinal Chemistry Research, 
and published the first major reference work 
"Medicinal Chemistry" in two volumes in 
1951. His last published work, a book, was 
written at age 90 (Understanding Medica- 
tions: What the Eabel Doesn't Tell You, June 
1995). Dr. Burger received the Eouis Pasteur 
Medal of the Pasteur Institute and the Amer->, 
ican Chemical Society Smissman Award. Dr. 
Burger played the violin and loved classical 
music. He was married for 65 years to Prances 
Page Burger, a genteel Virginia lady who al- 
ways had a smile and an open house for the 
Professor's graduate students and postdoc- 
toral fellows. 
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PREFACE 



The Editors, Editorial Board Members, and 
John WUey and Sons have worked for three 
and a half years to update the fifth edition of 
Burger's Medicinal Chemistry and Drug Dis- 
covery. The sixth edition has several new and 
unique features. Eor the first time, there will 
be an online version of this major reference 
work. The online version will permit updating 
and easy access. Eor the first time, all volumes 
are structured entirely according to content 
and published simultaneously. Our intention 
was to provide a spectrum of fields that would 
provide new or experienced medicinal chem- 
ists, biologists, pharmacologists and molecu- 
lar biologists entry to their subjects of interest 
as well as provide a current and global per- 
spective of drug design, and drug develop- 
ment. 

Our hope was to make this edition of 
Burger the most comprehensive and useful 
published to date. To accomplish this goal, we 
expanded the content from 69 chapters (5 vol- 
umes) by approximately 50% (to over 100 
chapters in 6 volumes). We are greatly in debt 
to the authors and editorial board members 
participating in this revision of the major ref- 
erence work in our field. Several new subject 
areas have emerged since the fifth edition ap- 
peared. Proteomics, genomics, bioinformatics, 
combinatorial chemistry, high-throughput 
screening, blood substitutes, allosteric effec- 
tors as potential drugs, COX inhibitors, the 
statins, and high-throughput pharmacology 
are only a few. In addition to the new areas , we 
have filled in gaps in the fifth edition by in- 
cluding topics that were not covered. In the 



sixth edition, we devote an entire subsection 
of Volume 4 to cancer research; we have also 
reviewed the major published Medicinal 
Chemistry and Pharmacology texts to ensure 
that we did not omit any major therapeutic 
classes of drugs. An editorial board was consti- 
tuted for the first time to also review and sug- 
gest topics for inclusion. Their help was 
greatly appreciated. The newest innovation in 
this series will be the publication of an aca- 
demic, "textbook-like" version titled, "Bur- 
ger's Eundamentals of Medicinal Chemistry." 
The academic text is to be published about a 
year after this reference work appears. It will 
also appear with soft cover. Appropriate and 
key information willbe extracted from the ma- 
jor reference. 

There are numerous colleagues, friends, 
and associates to thank for their assistance. 
Eirst and foremost is Assistant Editor Dr. 
John Andrako, Professor emeritus, Virginia 
Commonwealth University, School of Phar- 
macy. John and I met almost every Tuesday 
for over three years to map out and execute 
the game plan for the sixth edition. His contri- 
bution to the sixth edition cannot be under- 
stated. Ms. Susanne Steitz, Editorial Program 
Coordinator at Wiley, tirelessly and meticu- 
lously kept us on schedule. Her contribution 
was also key in helping encourage authors to 
return manuscripts and revisions so we could 
publish the entire set at once. I would also like 
to especially thank colleagues who attended 
the QSAR Gordon Conference in 1999 for very 
helpful suggestions, especially Roy Vaz, John 
Mason, Yvonne Martin, John Block, and Hugo 
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Preface 



Kubinyi. The editors are greatly indebted to 
Professor Peter Ruenitz for preparing a tem- 
plate chapter as a guide for all authors. My 
secretary, Michelle Craighead, deserves spe- 
cial thanks for helping contact authors and 
reading the several thousand e-mails gener- 
ated during the project. I also thank the com- 
puter center at Virginia Commonwealth Uni- 
versity for suspending rules on storage and 
e-mail so that we might safely store all the 
versions of the author's manuscripts where 
they could be backed up daily. Last and not 
least, I want to thank each and every author, 
some of whom tackled two chapters. Their 
contributions have provided our-field with a 
sound foundation of information to build for 
the future. We thank the many reviewers of 
manuscripts whose critiques have greatly en- 
hanced the presentation and content for the 
sixth edition. Special thanks to Professors 
Richard Glennon, William Soine, Richard 
Westkaemper, Umesh Desai, Glen Kel- 
logg, Brad Windle, Lemont Kier, Malgorzata 



Dukat, Martin Safo, Jason Rife, Kevin Reyn- 
olds, and John Andrako in our Department 
of Medicinal Chemistry, School of Pharmacy, 
Virginia Commonwealth University for sug- 
gestions and special assistance in reviewing 
manuscripts and text. Graduate student 
Derek Cashman took able charge of our web 
site, httpd/www.burgersmedchem.com, an- 
other first for this reference work. I would es- 
pecially like to thank my dean, Victor 
Yanchick, and Virginia Commonwealth Uni- 
versity for their support and encouragement. 
Finally, I thank my wife Nancy who under- 
stood the magnitude of this project and pro- 
vided insight on how to set up our home office 
as well as provide John Andrako and me 
lunchtime menus where we often dreamed of 
getting chapters completed in all areas we se- 
lected. To everyone involved, many, many 
thanks. 

Donald J. Abraham 

Midlothian, Virginia 




Dr. Alfred Burger 



Hiotograph of Professor Burger followed by his comments to the American Chemical Society 26th Medicinal 
Chemistry Symposium on June 14, 1998. This was his last public appearance at a meeting of medicinal 
cheimists. As general chair of the 1998 ACS Medicinal Chemistry Symposium, the editor invited Professor 
Burger to open the meeting. He was concerned that the young chemists would not know who he was and he 
might have an attack due to his battle with Parkinson's disease. These fears never were realized and his 
comments to the more than five hundred attendees drew a sustained standing ovation. The Professor was 93, 
and it was Mrs. Burger's 91st birthday. 
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Opening Remarks 



ACS 26 **^ Medicinal Chemistry Symposium 

June 14, 1998 
Alfred Burger 
University of Virginia 



It has been 46 years since the third Medicinal Chemistry Symposium met at the University of 
Virginia in Charlottesville in 1952. Today, the Virginia Commonwealth University welcomes 
you and joins all of you in looking forward to an exciting program. 

So many aspects of medicinal chemistry have changed in that half century that most of the 
new data to be presented this week would have been unexpected and unbelievable had they 
been mentioned in 1952. The upsurge in biochemical understandings of drug transport and 
drug action has made rational drug design a reality in many therapeutic areas and has made 
medicinal chemistry an independent science. We have our own journal, the best in the world, 
whose articles comprise all the innovations of medicinal researches. And if you look at the 
announcements of job opportunities in the pharmaceutical industry as they appear in 
Chemical & Engineering News, you wiU find in every issue more openings in medicinal 
chemistry than in other fields of chemistry. Thus, we can feel the excitement of being part of 
this medicinal tidal wave, which has also been fed by the expansion of the needed research 
training provided by increasing numbers of universities. 

The ultimate beneficiary of scientific advances in discovering new and better therapeutic 
agents and understanding their modes of action is the patient. Physicians now can safely look 
forward to new methods of treatment of hitherto untreatable conditions. To the medicinal 
scientist all this has increased the pride of belonging to a profession which can offer predictable 
intellectual rewards. Our symposium wiU be an integral part of these developments. 
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1 INTRODUCTION 

It has been nearly 40 years since the quantita- 
tive structure-activity relationship (QSAR) 
paradigm first found its way into the practice 
of agrochemistry, pharmaceutical chemistry, 
toxicology, and eventually most facets of 
chemistry ( l)Its stayingpower may be attrib- 
uted to the strength of its initial postulate that 
activity was a function of structure as de- 
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scribed by electronic attributes, hydrophobic- 
ity, and steric properties as well as the rapid 
and extensive development in methodologies 
and computational techniques that have en- 
sued to delineate and refine the many vari- 
ables and approaches that define the para- 
digm. The overall goals of QSAR retain their 
original essence and remain focused on the 
predictive ability of the approach and its re- 
ceptiveness to mechanistic interpretation. 




1 Introduction 



Rigorous analysis and fine-tuning of indepen- 
dent variables has led to an expansion in de- 
velopment of molecular and atom-based de- 
scriptors, as well as descriptors derived from 
quantum chemical calculations and spectros- 
copy (2). The improvement in high-through- 
put screening procedures allows for rapid 
screening of large numbers of compounds un- 
der similar test conditions and thus minimizes 
the risk of combining variable test data from 
many sources. 

The formulation of thousands of equa- 
tions using QSAR methodology attests to a 
validation of its concepts and its utility in 
the elucidation of the mechanism of action of 
drugs at the molecular level and a more com- 
plete understanding of physicochemical phe- 
nomena such as hydrophobicity. It is now 
possible not only to develop a model for a 
system but also to compare models from a 
biological database and to draw analogies 
with models from a physical organic data- 
base (3). This process is dubbed model min- 
ing and it provides a sophisticated approach 
to the study of chemical-biological interac- 
tions. QSAR has clearly matured, although 
it still has a way to go. The previous review 
by Kubinyi has relevant sections covering 
portions of this chapter as well as an exten- 
sive bibliography recommended for a more 
complete overview (4). 

1.1 Historical Development of QSAR 

More than a century ago, Crum-Brown and 
Fraser expressed the idea that the physiologi- 
cal action of a substance was a function of its 
chemical composition and constitution (5). A 
few decades later, in 1 893, Richet showed that 
the cytotoxicities of a diverse set of simple or- 
ganic molecules were inversely related to their 
corresponding water solubilities (6). At the 
turn of the 20th century, Meyer and Overton 
independently suggested that the narcotic (de- 
pressant) action of a group of organic com- 
pounds paralleled their ohve oil/water parti- 
tion coefficients (7, 8). In 1939 Ferguson 
introduced a thermodynamic generalization 
to the correlation of depressant action with 
the relative saturation of volatile compounds 
in the vehicle in which they were administered 
(9). The extensive work of Albert, and Bell and 
Robhn established the importance of ioniza- 



tion of bases and weak acids in bacteriostatic 
activity (10-12). Meanwhile on the physical 
organic front, great strides were being made in 
the delineation of substituent effects on or- 
ganic reactions, led by the seminal work of 
Hammett, which gave rise to the "sigma-rho" 
culture (13, 14). Taft devised a way for sepa- 
rating polar, steric, and resonance effects and 
introducing the first steric parameter, Eq (15). 
The contributions of Hammett and Taft to- 
gether laid the mechanistic basis for the devel- 
opment of the QSAR paradigm by Hansch and 
Fujita. In 1962 Hansch and Muir published 
their brilliant study on the structure-activity 
relationships of plant growth regulators and 
their dependency on Hammett constants and 
hydrophobicity (16). Using the octanol/water 
system, a whole series of partition coefficients 
were measured, and thus a new hydrophobic 
scale was introduced (17). The parameter a, 
which is the relative hydrophobicity of a sub- 
stituent, was defined in a manner analogous to 
the definition of sigma (18). 

TTx = log Px - log Fh (1-1) 

Px ^nd Ph represent the partition coefficients 
of a derivative and the parent molecule, re- 
spectively. Fujita and Hansch then combined 
these hydrophobic constants with Hammett's 
electronic constants to yield the linear Hansch 
equation and its many extended forms (19). 

hogHC = a<T + bTT ck (1.2) 

Hundreds of equations later, the failure of lin- 
ear equations in cases with extended hydro- 
phobicity ranges led to the development of the 
Hansch parabolic equation (20): 

Log 1/C = <3 . log P 

- 6(logP)^ + ca + k 

The delineation of these models led to explo- 
sive development in QSAR analysis and re- 
lated approaches. The Kubinyi bilinear 
model is a refinement of the parabolic model 
and, in many cases, it has proved to be supe- 
rior (21). 
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Log 1/C = a . log P 

- b • log(j8 • P + 1) + ^ 

Besides the Hansch approach, other method- 
ologies were also developed to tackle struc- 
ture-activity questions. The Free-Wilson ap- 
proach addresses structure-activity studies in 
a congeneric series as described in Equation 
1.5 (22). 

BA = 2 + w (1-5) 

BA is the biological activity, u is the average 
contribution of the parent molecule, and is 
the contribution of each structural feature; Xi 
denotes the presence 2Q = 1 or absence 2Q = 0 
of a particular structural fragment. Limita- 
tions in this approach led to the more sophis- 
ticated Fujita-Ban equation that used the log- 
arithm of activity, which brought the activity 
parameter in line with other free energy-re- 
lated terms (23). 

Log BA = 2 GiXi + u (1.6) 

In Equation 1.6, u is defined as the calculated 
biological activity value of the unsubstituted 
parent compound of a particular series. rep- 

resents the biological activity contribution of 
the substituents, whereas Xj is ascribed with a 
value of one when the substituent is present or 
zero when it is absent. Variations on this ac- 
tivity-based approach have been extended by 
Klopman et al. (24) and Enslein et al. (25). 
Topological methods have also been used to 
address the relationships between molecular 
structure and physical/biological activity. The 
minimum topological difference (MTD) 
method of Simon and the extensive studies on 
molecular connectivity by Kier and Hall have 
contributed to the development of quantita- 
tive structure property/activity relationships 
(26, 27). Connectivity indices based on hydro- 
gen-suppressed molecular structures are rich 
in information on branching, 3-atom frag- 
ments, the degree of substitution, proximity of 
substituents and length, and heteroatom of 
substituted rings. A method in its embryonic 
state of development uses both graph bond 



distances and Euclidean distances among at- 
oms to calculate E-state values for each atom 
in a molecule that is sensitive to conforma- 
tional structure. Recently, these electrotopo- 
logical indices that encode significant struc- 
tured information on the topological state of 
atoms and fragments as well as their valence 
electron content have been applied to biologi- 
cal and toxicity data (28). Other recent devel- 
opments in QSAR include approaches such as 
HQSAR, Inverse QSAR, and Binary QSAR 
(29-32). Improved statistical tools such as 
partial least square (PLS) can handle situa- 
tions where the number of variables over- 
whelms the number of molecules in a data set, 
which may have collinear X- variables (33). 

1 .2 Development of Receptor Theory 

The central theme of molecular pharmacol- 
ogy, and the underlying basis of SAR studies, 
has focused on the elucidation of the structure 
and function of drug receptors. It is an en- 
deavor that proceeds with unparalleled vigor, 
fueled by the developments in genomics. It is 
generally accepted that endogenous and exog- 
enous chemicals interact with a binding site 
on a specific macromolecular receptor. This in- 
teraction, which is determined by intermolec- 
ular forces, may or may not elicit a pharmaco- 
logical response depending on its eventual site 
of action. 

The idea that drugs interacted with specific 
receptors began with Langley, who studied the 
mutually antagonistic action of the alkaloids, 
pilocorpine and atropine. He realized that 
both these chemicals interacted with some re- 
ceptive substance in the nerve endings of the 
gland cells (34). Paul Ehrlich defined the re- 
ceptor as the "binding group of the protoplas- 
mic molecule to which a foreign newly intro- 
duced group binds" (35). In 1905 Langley’s 
studies on the effects of curare on muscular 
contraction led to the first delineation of crit- 
ical characteristics of a receptor: recognition 
capacity for certain ligands and an amplifica- 
tion component that results in a pharmacolog- 
ical response (36). 

Receptors are mostly integral proteins em- 
bedded in the phospholipid bilayer of cell 
membranes. Rigorous treatment with deter- 
gents is needed to dissociate the proteins from 
the membrane, which often results in loss of 
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integrity and activity. Pure proteins such as 
enzymes also act as drug receptors. Their rel- 
ative ease of isolation and amplification have 
made enzymes desirable targets in structure- 
based ligand design and QSAR studies. Nu- 
cleic acids comprise an important category of 
drug receptors. Nucleic acid receptors (apta- 
mers), which interact with a diverse number 
of small organic molecules, have been isolated 
by in vitro selection techniques and studied 
(37). Recent binary complexes provide insight 
into the molecular recognition process in 
these biopolymers and also establish the im- 
portance of the architecture of tertiary motifs 
in nucleic acid folding (38). Groove-binding li- 
gands such as lexitropsins hold promise as po- 
tential drugs and are thus suitable subjects for 
focused QSAR studies (39). 

Over the last 20 years, extensive QSAR 
studies on ligand-receptor interactions have 
been carried out with most of them focusing 
on enzymes. Two recent developments have 
augmented QSAR studies and established an 
attractive approach to the elucidation of the 
mechanistic underpinnings of ligand-receptor 
interactions: the advent of molecular graphics 
and the ready availability of X-ray crystallog- 
raphy coordinates of various binary and ter- 
nary complexes of enzymes with diverse li- 
gands and cofactors. Early studies with serine 
and thiol proteases (chymotrypsin, trypsin, 
and papain), alcohol dehydrogenase, and nu- 
merous dihydrofolate reductases (DHFR) not 
only established molecular modeling as a pow- 
erfiil tool, but also helped clarify the extent of 
the role of hydrophobicity in enzyme-ligand 
interactions (40-44) . Empirical evidence indi- 
cated that the coefficients with the hydropho- 
bic term could be related to the degree of de- 
solvation of the ligand by critical amino acid 
residues in the binding site of an enzyme. To- 
tal desolvation, as characterized by binding in 
a deep crevice/pocket, resulted in coefficients 
cf approximately 1 .0 (0.9- 1.1) (44). An exten- 
sion of this agreement between the mathemat- 
ical expression and structure as determined by 
X-ray crystallography led to the expectation 
that the binding of a set of substituents on the 
surface of an enzyme would yield a coefficient 
cf about 0.5 (0.4- 0.6) in the regression equa- 
tion, indicative of partial desolvation. 



Probing of various enzymes by different li- 
gands also aided in dispelling the notion of 
Fischer's rigid lock-and-key concept, in which 
the ligand (key) fits precisely into a receptor 
(lock). Thus, a "negative" impression of the 
substrate was considered to exist on the en- 
zyme surface (geometric complementarity). 
Unfortunately, this rigid model fails to ac- 
count for the effects of allosteric ligands, and 
this encouraged the evolution of the induced- 
fit model. Thus, "deformable" lock-and-key 
models have gained acceptance on the basis of 
structural studies, especially NMR (45). 

It is now possible to isolate membrane- 
bound receptors, although it is still a challenge 
to delineate their chemistry, given that sepa- 
ration from the membrane usually ensures 
loss of reactivity. Nevertheless, great ad- 
vances have been made in this arena, and the 
three-dimensional structures of some mem- 
brane-bound proteins have recently been elu- 
cidated. To gain an appreciation for mecha- 
nisms of ligand-receptor interactions, it is 
necessary to consider the intermolecular 
forces at play. Considering the low concentra- 
tion of drugs and receptors in the human body, 
the law of mass action cannot account for the 
ability of a minute amount of a drug to elicit a 
pronounced pharmacological effect. The driv- 
ing force for such an interaction may be attrib- . 
uted to the low energy state of the drug- 
receptor complex: Kj^ = [Drug[Receptor]/ 
[Drug-ReceptorComplex].Thus, the biological 
activity of a drug is determined by its affinity 
for the receptor, which is measured by its 
the dissociation constant at equilibrium. A 
smaller Kjy implies a large concentration of 
the drug-receptor complex and thus a greater 
affinity of the drug for the receptor. The latter 
property is promoted and stabilized by mostly 
noncovalent interactions sometimes aug- 
mented by a few covalent bonds. The sponta- 
neous formation of a bond between atoms re- 
sults in a decrease in free energy; that is, AG is 
negative. The change in free energy AG is re- 
lated to the equilibrium constant 

AG° = -RT In (1.7) 

Thus, small changes in AG'' can have a pro- 
found effect on equilibrium constants. 
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Table 1.1 Types rf Intemioleciilar Forces 



Bond Type 


Bond Strength 
(kcal/mol) 


Example 


1. Covalent 


40-140 


CH 3 CH 2 O-H 

0 


2. Ionic (Electrostatic) 


5 


^ II 

R 4 N' 1 """ 0 — c — 


3. Hydrogen 


1-10 


— H 
\ 

H 


4. Dipole-dipole 


1 


R 3 N:"ii^C =0 


5. van der Waals 


0.5-1 


\1 1 / 

CiiiiiiiiiiC 

/l l\ 


6. Hydrophobic 


1 





In the broadest sense, these "bonds" would 
include covalent, ionic, hydrogen, dipole-di- 
pole, van der Waals, and hydrophobic interac- 
tions. Most drug-receptor interactions consti- 
tute a combination of the bond types listed in 
Table 1.1, most of which are reversible under 
physiological conditions. 

Covalent bonds are not as important in 
drug-receptor binding as noncovalent interac- 
tions. Alkylating agents in chemotherapy tend 
to react and form an immonium ion, which 
then alkylates proteins, preventing their nor- 
mal participation in cell divisions. Baker's 
concept of active site directed irreversible in- 
hibitors was well established by covalent for- 
mation of Baker's antifolate and dihydrofolate 
reductase (46). 

Ionic (electrostatic) interactions are formed 
between ions of opposite charge with energies 
that are nominal and that tend to fall off with 
distance. They are ubiquitous and because 
they act across long distances, they play a 
prominent role in the actions of ionizable 
drugs. The strength of an electrostatic force is 
directly dependent on the charge of each ion 
and inversely dependent on the dielectric con- 
stant of the solvent and the distance between 
the charges. 

Hydrogen bonds are ubiquitous in nature: 
their multiple presence contributes to the sta- 



bility of the (ahelix and base-pairing in DNA. 
Hydrogen bonding is based on an electrostatic 
interaction between the nonbonding electrons 
of a heteroatom (e.g., N, 0, S) and the elec- 
tron-deficient hydrogen atom of an -OH, SH, 
or NH group. Hydrogen bonds are strongly 
directional, highly dependent on the net de- 
gree of solvation, and rather weak, having en- 
ergies ranging from 1 to 10 kcal/mol (47, 48). 
Bonds with this type of strength are of critical 
importance because they are stable enough to 
provide significant binding energy but weak 
enough to allow for quick dissociation. The 
greater electronegativity of atoms such as ox- 
ygen, nitrogen, sulfur, and halogen, compared 
to that of carbon, causes bonds between these 
atoms to have an asymmetric distribution of 
electrons, which results in the generation of 
electronic dipoles. Given that so many func- 
tional groups have dipole moments, ion-dipole 
and dipole-dipole interactions are frequent. 
The energy of dipole-dipole interactions can 
be described by Equation 1.8, where ja is the 
dipole moment, 0 is the angle between the two 
poles of the dipole, D is the dielectric constant 
of the medium and r is the distance between 
the charges involved in the dipole. 

E = 2fxifjL2^os ©icos d2/Dr^ (1.8) 




2 Tools and Techniques of QSAR 



7 



Although electrostatic interactions are 
generally restricted to polar molecules, there 
are also strong interactions between nonpolar 
molecules over small intermolecular dis- 
tances. Dispersion or London/van der Waals 
forces are the universal attractive forces be- 
tween atoms that hold nonpolar molecules to- 
gether in the liquid phase. They are based on 
polarizability and these fluctuating dipoles or 
shifts in electron clouds of the atoms tend to 
induce opposite dipoles in adjacent molecules, 
resulting in a net overall attraction. The en- 
ergy of this interaction decreases very rapidly 
in proportion to 1/r®, where r is the distance 
separating the two molecules. These van der 
Waals forces operate at a distance of about 
0.4-0.6 nm and exert an attraction force of 
less than 0.5 kcal/mol. Yet, although individ- 
ual van der Waals forces make a low energy 
contribution to an event, they become signifi- 
cant and additive when summed up over a 
large area with close surface contact of the 
atoms. 

Hydrophobicity refers to the tendency of 
nonpolar compounds to transfer from an 
aqueous phase to an organic phase (49, 50). 
When a nonpolar molecule is placed in water, 
it gets solvated by a "sweater" of water mole- 
cules ordered in a somewhat icelike manner. 
This increased order in the water molecules 
surrounding the solute results in a loss of en- 
tropy. Association of hydrocarbon molecules 
leads to a "squeezing out" of the structured 
water molecules. The displaced water becomes 
bulk water, less ordered, resulting in a gain in 
entropy, which provides the driving force for 
what has been referred to as a hydrophobic 
bond. Although this is a generally accepted 
view of hydrophobicity, the hydration of apo- 
lar molecules and the noncovalent interac- 
tions between these molecules in water are 
still poorly understood and thus the source of 
continued examination (51-53). 

Because noncovalent interactions are gen- 
erally weak, cooperativity by several types of 
interactions is essential for overall activity. 
Enthalpy terms will be additive, but once the 
first interaction occurs, translational entropy 
is lost. This results in a reduced entropy loss in 
the second interaction. The net result is that 
eventually several weak interactions combine 
to produce a strong interaction. One can safely 



state that it is the involvement of myriad in- 
teractions that contribute to the overall selec- 
tivity of drug-receptor interactions. 

2 TOOLS AND TECHNICaJES OF G6AR 

2.1 Biological Parameters 

In QSAR analysis, it is imperative that the 
biological data be both accurate and precise to 
develop a meaningful model. It must be real- 
ized that any resulting QSAR model that is 
developed is only as valid statistically as the 
data that led to its development. The equilib- 
rium constants and rate constants that are 
used extensively in physical organic chemistry 
and medicinal chemistry are related to free 
energy values AG. Thus for use in QSAR, stan- 
dard biological equilibrium constants such as 
or should be used in QSAR studies. 
Likewise only standard rate constants should 
be deemed appropriate for a QSAR analysis. 
Percentage activities (e.g., % inhibition of 
growth at certain concentrations) are not ap- 
propriate biological endpoints because of the 
nonlinear characteristic of dose-response rela- 
tionships. These types of endpoints may be 
transformed to equieffective molar doses. 
Only equilibrium and rate constants pass 
muster in terms of the free-energy relation- 
ships or influence on QSAR studies. Biological 
data are usually expressed on a logarithmic 
scale because of the linear relationship be- 
tween response and log dose in the midregion 
of the log dose-response curve. Inverse loga- 
rithms for activity (log 1/C) are used so that 
higher values are obtained for more effective 
analogs. Various types of biological data have 
been used in QSAR analysis. A few common 
endpoints are outlined in Table 1.2. 

Biological data should pertain to an aspect 
of biological/biochemical function that can be 
measured. The events could be occurring in 
enzymes, isolated or bound receptors, in cellu- 
lar systems, or whole animals. Because there 
is considerable variation in biological re- 
sponses, test samples should be run in dupli- 
cate or preferably triplicate, except in whole 
animal studies where assay conditions (e.g., 
plasma concentrations of a drug) preclude 
such measurements. 
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Table 1.2 Types of Biological Data Utilized 
in QSAR Analysis 



Source of Activity 


Biological Parameters 


1. Isolated receptors 


Rate constants 


Log fecat; Log Log k 


Michaelis-Menten 

constants 


IIK^ 


Inhibition constants 


Log i/Ki 


Affinity data 


pAg; pAi 


2. Cellular systems 


Inhibition constants 


Log I/IC50 


Cross resistance 


Log CR 


In vitro biological data 


Log lie 


Mutagenicity states 


Log TAqq 


3. ” In vivo" systems 


Biocencentration factor 


LogBCF 


In vivo reaction rates 


Log I (Induction) 


Pharmacodynamic 

rates 


Log T (total clearance) 



It is also important to design a set of mole- 
cules that wiU yield a range of values in terms 
of biological activities. It is understandable 
that most medicinal chemists are reluctant to 
synthesize molecules with poor activity, even 
though these data points are important in de- 
veloping a meaningful QSAR. Generally, the 
larger the range (>2 log units) in activity, the 
easier it is to generate a predictive QSAR. This 
kind of equation is more forgiving in terms of 
errors of measurement. A narrow range in bi- 
ological activity is less forgiving in terms of 
accuracy of data. Another factor that merits 
consideration is the time structure. Should a 
particular reading be taken after 48 or 72 h? 
Knowledge of cell cycles in cellular systems or 
biorhythms in animals would be advanta- 
geous. 

Each single step of drug transport, binding, 
and metabolism involves some form of parti- 
tioning between an aqueous compartment and 
a nonaqueous phase, which could be a mem- 
brane, serum protein, receptor, or enzyme. In 
the case of isolated receptors, the endpoint is 
clear-cut and the critical step is evident. But in 
more complex systems, such as cellular sys- 
tems or whole animals, many localized steps 
could be involved in the random-walk process 
and the eventual interaction with a target. 



Usually the observed biological activity is re- 
flective of the slow step or the rate-determin- 
ing step. 

To determine a defined biological response 
(e.g., IC 50 ), a dose-response curve is first es- 
tablished. Usually six to eight concentrations 
are tested to yield percentages of activity or 
inhibition between 20 and 80%, the linear por- 
tion of the curve. Using the curves, the dose 
responsible for an established effect can easily 
be determined. This procedure is meaningful 
if, at the time the response is measured, the 
system is at equilibrium, or at least under 
steady- state conditions. 

Other approaches have been used to apply 
the additivity concept and ascertain the bind- 
ing energy contributions of various substitu- 
ent (R) groups. Fersht et al. have measured 
the binding energies of various alkyl groups to 
aminoacyl-tRNA synthetases (54). Thus the 
AG values for methyl, ethyl, isopropyl, and 
thio substituents were determined to be 3.2, 
6.5, 9.6, and 5.4 kcal/mol, respectively. 

An alternative, generalized approach to de- 
termining the energies of various drug-recep- 
tor interactions was developed by Andrews et 
al. (55), who statistically examined the drug- 
receptor interactions of a diverse set of mole- 
cules in aqueous solution. Using Equation 1.9, 
a relationship was established between AG 
and£^x (intrinsic binding energy ),£?noF (energy' 
of average entropy loss), and the AS,, (energy 
of rotational and translational entropy loss). 

AG = T ASr,t ^DOF-^DOF (1*9) 

£?x denotes the sum of the intrinsic binding 
energy of each functional group of which tzx 
are present in each drug in the set. Using 
Equation 1 .9, the average binding energies for 
various functional groups were calculated. 
These energies followed a particular trend 
with charged groups showing stronger inter- 
actions and nonpolar entities, such as sp^, sp^ 
carbons, contributing very little. The applica- 
bility of this approach to specific drug-receptor 
interactions remains to be seen. 

2.2 Statistical Methods: Linear 
Regression Analysis 

The most widely used mathematical tech- 
nique in QSAR analysis is multiple regression 
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analysis (MRA). We wiU consider some of the 
basic tenets of this approach to gain a firm 
understanding of the statistical procedures 
that define a QSAR. Regression analysis is a 
powerful means for establishing a correlation 
between independent variables and a depen- 
dent variable such as biological activity (56). 

Yi = b + aXi + Ei (1.10) 

Certain assumptions are made with regard 
to this procedure (57): 



Expanding Equation 1.15, we obtain 

n 

SS = 2 - Y,^X, - Y,J> 

1 = 1 

- + aXtb (1.16) 

- 67obs + abX, + b^) 

Taking the partial derivative of Equation 1 . 14 

with respect to b and then with respect to a, 

results in Equations 1.17 and 1.18. 



1. The independent variables, which in this 
case usually include the physicochemical 
parameters, are measured without error. 
Unfortunately, this is not always the case, 
although the error in these variables is 
small compared to that in the dependent 
variable. 

2 Eor any given value of X, the Y values are 

independent and follow a normal distribu- 
tion, The error term possesses a normal 

distribution with a mean of zero. 

3. The expected mean value for the variable 
Y, for all values of X, lies on a straight line. 

4. The variance around the regression line is 
constant. The "best" straight line for 
model Yi=b + aZi E is drawn through 
the data points, such that the sum of the 
squares of the vertical distances from the 
points to the line is minimized. Y repre- 
sents the value of the observed data point 
and Ycaic is the predicted value on the line. 



The sum of squares SS = 2 (Yobs “ 


Y 


Y obs = ciXi + b + Ei 


(1.11) 


Ycalc 0,Xi + b 


(1.12) 


E = Yobs “ (^Xi — b 


(1.13) 


n 

2 = 2 V = SS 

i = l 


(1.14) 


= 2 0"*, - Y^r 





n 

Thus, SS = y - aXi - bf (U5) 

i = l 



dSS 

= 2 - 2(7„k. - * - oX,) (1.17) 

i = l 

rfSS " 

= 2 - 2Xi(y„b, -b- aXi) (1.18) 

i = l 

SS can be minimized with respect to b and a 
and divided by -2 to yield the normal Equa- 
tions 1.19 and 1.20. 



2 (y.b. - 6 - aXi) = 0 (1.19) 

n 

2 Xi(r,^ - 6 - aXi) - 0 (1,20) 

1 = 1 

These "normal equations" can be rewritten as 
follows: 

n n n 

6 2 + “ 2 = 2 xyob. (1.21) 

i=l i=l 

n n 

6 + a 2 X = 2 ^obs (1.22) 

1=1 i=l 

The solution of these simultaneous equa- 
tions yields a and b. More thorough analyses 
of these procedures have been examined in 
detail (19, 58-60). The following simple ex- 
ample, illustrated by Table 1.3, will illus- 
trate the nuances of a linear regression anal- 
ysis. 
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Table 1.3 Antibacterial Activity 
of iV'-(i2-phenyl)sulfanila]]iides 



Compound 


(TiX) 


Observed BA (Y) 


1. 4 -CH 3 


-0.17 


4.66 


2. 4-H 


0 


4.80 


3. 4-Cl 


0.23 


4.89 


4. 2-Cl 


0.23 


5.55 


5. 2 -NO 2 


0.78 


6.00 


6 . 4-NO^ 


0.78 


6.00 



k = no. of variables = 1 
n = no. of data points = 6 
2X = 1.85 
2 Y = 31.90 

2X2 _ 1 352 

2 y2 = 171.45 
2XY = 10.968 



For linear regression analysis, Y = ax + b 

a = {ri'^xy-^x-^ y)/n • 2 
- (2 = 1.45 

b = (2 y “ Of ‘ 2 ~ 4.869 



2 A' 



s = 



SSQ 



In — ^ — 1 



'n — k - 1 



(1.26) 



The correlation coefficient r is a measure of 
quality of fit of the model. It constitutes the 
variance in the data. In an ideal situation one 
would want the correlation coefficient to be 
equal to or approach 1, but in reality because 
of the complexity of biological data, any value 
above 0.90 is adequate. The standard devia- 
tion is an absolute measure of the quality of fit. 
Ideally s should approach zero, but in experi- 
mental situations, this is not so. It should be 
small but it cannot have a value lower than the 
standard deviation of the experimental data. 
The magnitude of s may be attributed to some 
experimental error in the data as well as im- 
perfections in the biological model. A larger 
data set and a smaller number of variables 
generally lead to lower values of s. The F value 
is often used as a measure of the level of sta- 
tistical significance of the regression model. It 
is defined as denoted in Equation 1.27. 



F 



k2-k\,n-k2 



(SSi - SS 2 ) (n-k2- 1) 



SS2 




(1.27) 



- (2 y)Vn) 

= 0.875 .-. r = 0.935 
= (1 - H) 

X (2 - (2 yf/n)/(n -k-1) 
= 0.058 .-. s = 0.240 
F = r^’{n-k- l)/k{l - r^) = 28.52 



The correlation coefficient r, the total vari- 
ance SS„ the unexplained variance SSQ, 
and the standard deviation, are defined as 
follows: 



r^ = 1 - 



S A" 



SSr = 2 (^obs - 



= S y^ - (2 y)^/^ 
2 = SSQ = 2 (^bs - 



(1.23) 

(1.24) 

(1.25) 



A larger value of F implies a more significant 
correlation has been reached. The confidence 
intervals of the coefficients in the equation re- 
veal the significance of each regression term in 
the equation. 

To obtain a statistically sound QSAR, it is 
important that certain caveats be kept in 
mind. One needs to be cognizant about col- 
linearity between variables and chance corre- 
lations. Use of a correlation matrix ensures 
that variables of significance and/or interest 
are orthogonal to each other. With the rapid 
proliferation of parameters, caution must be 
exercised in amassing too many variables for a 
QSAR analysis. Topliss has elegantly demon- 
strated that there is a high risk of ending up 
with a chance correlation when too many vari- 
ables are tested (62). 

Outliers in QSAR model generation 
present their own problems. If they are badly 
fit by the model (off by more than 2 standard 
deviations), they should be dropped from the 
data set, although their elimination should be 
noted and addressed. Their aberrant behavior 
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n 



may be attributed to inaccuracies in the test- 
ing procedure (usually dilution errors) or un- 
usual behavior. They often provide valuable 
information in terms of the mechanistic inter- 
pretation of a QSAR model. They could be par- 
ticipating in some intermolecular interaction 
that is not available to other members of the 
data set or have a drastic change in mecha- 
nism. 

2.3 Compound Selection 

In setting up to run a QSAR analysis, com- 
pound selection is an important angle that 
needs to be addressed. One of the earliest 
manual methods was an approach devised by 
Craig, which involves two-dimensional plots of 
important physicochemical properties. Care is 
taken to select substituents from all four 
quadrants of the plot (63). The Topliss opera- 
tional scheme allows one to start with two 
compounds and construct a potency tree that 
grows branches as the substituent set is ex- 
panded in a stepwise fashion (64). Topliss 
later proposed a batchwise scheme including 
certain substituents such as the 3,4-Cl2, 4-Cl, 
4-CH„ 4-CXIH,, and 4-H analogs (65). Other 
methods of manual substituent selection in- 
clude the Fibonacci search method, sequential 
simplex strategy, and parameter focusing by 
Magee (66-68). 

One of the earliest computer-based and sta- 
tistical selection methods, cluster analysis was 
devised by Hansch to accelerate the process 
and diversity of the substituents (1). Newer 
methodologies include D-optimal designs, 
which focus on the use of det (X'X)^ the vari- 
ance-covariance matrix. The determinant of 
this matrix yields a single number, which is 
maximized for compounds expressing maxi- 
mum variance and minimum covariance (69- 
71). A combination of fractional factorial de- 
sign in tandem with a principal property 
approach has proven useful in QSAR (72). Ex- 
tensions of this approach using multivariate 
design have shown promise in environmental 
QSAR with nonspecific responses, where the 
clusters overlap and a cluster-based design ap- 
proach has to be used (73). With strongly clus- 
tered data containing several classes of com- 
pounds, a new strategy involving local 
multivariate designs within each cluster is de- 
scribed. The chosen compounds from the local 



designs are grouped together in the overall 
training set that is representative of all clus- 
ters (74). 

3 PARAMETERS USED IN QSAR 

3.1 Electronic Parameters 

Parameters are of critical importance in deter- 
mining the types of intermolecular forces that 
underly drug-receptor interactions. The three 
major types of parameters that were initially 
suggested and still hold sway are electronic, 
hydrophobic, and steric in nature (20, 75). Ex- 
tensive studies using electronic parameters 
reveal that electronic attributes of molecules 
are intimately related to their chemical reac- 
tivities and biological activities. A search of a 
computerized QSAR database reveals the fol- 
lowing: the common Hammett constants (c, 
cr'^, (T~) account for 700018500 equations in 
the Physical organic chemistry (PHYS) data- 
base and nearly 1600/8000 in the Biology 
(BIO) database, whereas quantum chemical 
indices such as HOMO, EUMO, BDE, and po- 
larizability appear in 100 equations in the BIO 
database (76). 

The extent to which a given reaction re- 
sponds to electronic perturbation constitutes 
a measure of the electronic demands of that 
reaction, which is determined by its mecha- , 
nism. The introduction of substituent groups 
into the framework and the subsequent alter- 
ation of reaction rates helps delineate the 
overall mechanism of reaction. Early work ex- 
amining the electronic role of substituents on 
rate constants was first tackled by Burckhardt 
and firmly established by Hammett (13, 14, 
77, 78). Hammett employed, as a model reac- 
tion, the ionization in water of substituted 
benzoic acids and determined their equilib- 
rium constants K^- See Equation 1.28. This 
led to an operational definition of cr, the sub- 
stituent constant. It is a measure of the size of 
the electronic effect for a given substituent 
and represents a measure of electronic charge 
distribution in the benzene nucleus. 

<7x = log Kx - log Kn or 

(1.29) 

log(K^/Kn) = -pKy, + pK^ 
Electron-withdrawing substituents are thus 
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COOH 




(1.28) 

COO- 




X 



ceptibility of a reaction to substituent effects. 
A positive rho value suggests that a reaction is 
aided by electron withdrawal from the reac- 
tion site, whereas a negative rho value implies 
that the reaction is assisted by electron dona- 
tion at the reaction site. Hammett also drew 
attention to the fact that a plot of log for 
benzoic acids versus log k for ester hydrolysis 
of a series of molecules is linear, which sug- 
gests that substituents exert a similar effect in 
dissimilar reactions. 

log ^ oclog — = p • a (1.32) 



characterized by positive values, whereas elec- 
tron-donating ones have negative values. In 
an extension of this approach, the ionization 
of substituted phenylacetic acids was mea- 
sured. 




+ H2O 





+ H3O+ 



(1.30) 



The effect of the 4-Cl substituent on the ion- 
ization of 4-Cl phenylacetic acid (PA) was 
found to be proportional to its effect on the 
ionization of 4-Cl benzoic acid (BA). 



Although this expression is empirical in na- 
ture, it has been validated by the sheer volume 
of positive results. It is remarkable because 
four different energy states must be related. 

A correlation of this type is clearly mean- 
ingful; it suggests that changes in structure 
produce proportional changes in the activa- 
tion energy AG* for such reactions. Hence, the 
derivation of the name for which the Hammett 
equation is universally known: linear free en- 
ergy relationship (LFER). Equation 1.32 has 
become known as the Hammett equation and 
has been applied to thousands of reactions 
that take place at or near the benzene ring 
bearing substituents at the meta and para po- 
sitions. Because of proximity and steric ef- 
fects, ortho- substituted molecules do not al- 
ways follow this maxim and are subject to 
different parameterizations. Thus, an ex- 
panded approach was established by Charton 
(79) and Fujita and Nishioka (80). Charton 
partitioned the ortho electronic effect into its 
inductive, resonance, and steric contribu- 
tions; the factors a, /3, and X are susceptibility 
or reaction constants and h is the intercept. 



l0g(i!L'ci(PA/^'H(PA)) 10g(ifcKBA/^H(BA)) 
Since log(/!Lci(BA/^H(BA)) = o-, 



K'ci 

then log -^r~ = p' o- 



(1.31) 



H 



p (rho) is defined as a proportionality or reac- 
tion constant, which is a measure of the sus- 



Log k = acTj + jSo-R -I- Xr, + h (1.33) 

Fujita and Nishioka used an integrated ap- 
proach to deal with ortho substituents in data 
sets including meta and para substituents. 

Lx)g k = p a + + /f’ora.o + C ( 1 -34) 

For ortho substituents, para sigma values 




3 Parameters Used in QSAR 



13 



were used in addition to TafUs Eq values and 
Swain-Lupton field constants 

The reason for employing alternative treat- 
ments to ortho-substituted aromatic mole- 
cules is that changes in rate or ionization con- 
stants mediated by meta or para substituents 
are mostly changes in (H* or ZiiT® because sub- 
stitution does not affect AS* or AS". Ortho 
substituents affect both enthalpy and entropy; 
the effect on entropy is noteworthy because 
entropy is highly sensitive to changes in the 
size of reagents and substituents as well as 
degree of solvation. Bolton et al. examined the 
ionization of substituted benzoic acids and 
measured accurate values for AG, AH, and A S 
(81). A hierarchy of different scenarios, under 
which an LFER operates, was established: 

1. is constant and A S varies for a series. 

2. AS " is constant and AH varies. 

3. and AS" vary and are shown to be lin- 
early related. 

4. Precise measurements indicated that cate- 
gory 3 was the prevalent behavior in ben- 
zoic acids. 

Despite the extensive and successful use in 
QSAR studies, there are some limitations to 
the Hammett equation. 

1. Primary a values are obtained from the 
thermodynamic ionizations of the appro- 
priate benzoic acids at 25°C; these are reli- 
able and easily available. Secondary values 
are obtained by comparison with another 
series of compounds and are thus subject to 
error because they are dependent on the 
accuracy of a measured series and the de- 
velopment of a regression line using statis- 
tical methods. 

2. In some multisubstituted compounds, the 
lack of additivity needs to be noted. Proxi- 
mal effects are operative and tend to distort 
electronic contributions. For example, 

2 cr^ai<^(3,4,5-trichlorobenzoic acid) 

= 0.97; 

that is, 2 (Tm + cTp or 2(0.37) -H 0.23 



2<^obs(3,4,5-trichlorobenzoic acid) = 0.95 

Sigma values for smaller substituents are 
more likely to be additive. However, in the 
case of 3-methyl, 4-dimethylaminobenzoic 
acid, the discrepancy is high. For example, 

S a..a.(3-CH3, 4-N(CH 3)2 benzoic acid) 

= -0.90 

X o'obs( 3 -CH 3 , 4 -N(CH 3)2 benzoic acid) 

= -0.30 

The large discrepancy may be attributed to 
the twisting of the dimethylamino substitu- 
ent out of the plane of the benzene ring, 
resulting in a decrease in resonance. Exner 
and his colleagues have critically examined 
the use of additivity in the determination of 
a constants (82). 

3. Changes in mechanism or transition state 
cause discontinuities in Hammett plots. 
Nonlinear plots are often found in reac- 
tions that proceed by two concurrent path- 
ways (83, 84). 

4. Changes in solvent may lead to dissimilar- 
ities in reaction mechanisms. Thus extrap- 
olation of cr values from a polar solvent 
(e.g., CH 3 CN) to a nonpolar solvent such as 
benzene has to be approached cautiously. 
Solvation properties will differ consider- 
ably, particularly if the transition state is 
polar and/or the substituents are able to 
interact with the solvent. 

5. A strong positional dependency of sigma 
makes it imperative to use appropriate val- 
ues for positional, isomeric substituents. 
Substituents ortho to the reaction center 
are difficult to describe and thus one must 
resort to a Fujita-Nishioka analysis (80). 

6 . Thorough resonance or direct conjugation 
effects cause a breakdown in the Hammett 
equation. When coupling occurs between 
the substituent and the reaction center 
through the pi-electron system, reactivity 
is enhanced, diminished, or mitigated by 
separation. In a study of X-cumyl chlorides. 
Brown and Okamoto noticed the strong 
conjugative interaction between lone-pair. 
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para substituents and the vacant p-orbital 
in the transition state, which led to devia- 
tions in the Hammett plot (85). They de- 
fined a modified LFER applicable to this 
situation. 

Log^ = (p")(a") (1.35) 

was a new substituent constant that ex- 
pressed enhanced resonance attributes. A 
similar situation was noticed when a strong 
donor center was present as a reactant or 
formed as a product (e.g., phenols and ani- 
lines). In this case, strong resonance interac- 
tions were possible with electron-withdrawing 
groups (e.g., NO, or CN). A scale for such sub- 
stituents was constructed such that 

Log^=(p-)(<r-) (1.36) 

«H 

One shortcoming of the benzoic acid sys- 
tem is the extent of coupling between the car- 
boxyl group and certain lone-pair donors. In- 
sertion of a methylene group between the core 
(benzene ring) and the functional group 
(COOH moiety) leads to phenylacetic acids 
and the establishment of cr° scale from the ion- 
ization of X-phenylacetic acids. A flexible 
method of dealing with the variability of the 
resonance contribution to the overall elec- 
tronic demand of a reaction is embodied in the 
^ukawa-Tsuno equation (86). It includes nor- 
nial^and enhanced resonance contributions to 
an LFER. 

Log ^ = p[o- + r((T+ - a)] (1.37) 

where r is a measure of the degree of enhanced 
resonance interaction in relation to benzoic 
acid dissociations (r = 0) and cumyl chloride 
hydrolysis (r = 1). 

Most of the Hammett-type constants per- 
tain to aromatic systems. In evaluating an 
electronic parameter for use in aliphatic sys- 
tems, Taft used the relative acid and base hy- 
drolysis rates for esters. He developed equa- 
tion 1.38 as a measure of the inductive effect 



(a*) of a substituent R' in the ester R COOR, 
where B and A refer to basic and acidic hydro- 
lysis, respectively. 

0-* = 2^ [log(A;/^o)B - log(^/^o)A] (1.38) 

The factor of 2.48 was used to make a* equi- 
scalar with Hammett a values. Later, a Oj 
scale derived from the ionization of 4-X- 
bicyclo[2.2.2]octane-l-carboxyiic acids was 
shown to be related to a* (87, 88). It is now 
more widely used than a*. 

(Tj{X) = 0.45o-*(CH2X) (1.39) 

Ionization is a function of the electronic 
structure of an organic drug molecule. Albert 
was the first to clearly delineate the relation- 
ship between ionization and biological activity 
(89). Now, pK^ values are widely used as the 
independent variable in physical organic reac- 
tions and in biological systems, particularly 
when dealing with transport phenomena. 
However, caution must be exercised in inter- 
preting the dependency of biological activity 
on P^a values because P^a values are inher- 
ently composites of electronic factors that are 
used directly in QSAR analysis. 

In recent years, there has been a rapid 
growth in the application of quantum chemi- 
cal methodology to QSAR, by direct derivation 
of electronic descriptors from the molecular 
wave functions (90). The two most popular 
methods used for the calculation of quantum 
chemical descriptors are ab initio (Hartree- 
Fock) and semiempirical methods. As in other 
electronic parameters, QSAR models incorpo- 
rating quantum chemical descriptors will in- 
clude information on the nature of the inter- 
molecular forces involved in the biological 
response. Unlike other electronic descriptors, 
there is no statistical error in quantum chem- 
ical computations. The errors are usually 
made in the assumptions that are established 
to facilitate calculation (91). Quantum chemi- 
cal descriptors such as net atomic changes, 
highest occupied molecular orbital/lowest un- 
occupied molecular orbital (HOMO-LUMO) 
energies, frontier orbital electron densities, 
and superdelocalizabilities have been shown 
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to correlate weU with various biological activ- 
ities (92). A mixed approach using frontier or- 
bital theory and topological parameters have 
been used to calculate Hammett-like substitu- 
ent constants (93). 

(T= -2.480AJV - 7.894A£ 

- 0.605Dx/£>h • (EAh/EAx) (1.40) 

+ 0.009 0Sx + 0.028 2 '”■ + 0-279 
71 = 150, r" = 0.947, 
s = 0.079, F= 789.9 

In Equation 1.40, AN represents the extent 
of electron transfer between interacting ac- 
id-base systems; is the energy decrease in 
bimolecular systems underlying electron 
transfer; Dy^/D^ • (EAh/EAx) corresponds to 
electron affinity and distance terms; and 
05x factors the electrotopological state in- 
dex, whereas E a is the number of all 7r-elec- 
trons in the functional group. Observed 
principal component analysis (PGA) cluster- 
ing of 66 descriptors derived from AMI cal- 
culations was similar to that previously re- 
ported for monosubstituted benzenes (94, 
95). The advantages of quantum chemical 
descriptors are that they have definite 
meaning and are useful in the elucidation of 
intra- and intermolecular interactions and 
can easily be derived from the theoretical 
structure of the molecule. 

3.2 Hydrophobicity Parameters 

More than a hundred years ago, Meyer and 
Overton made their seminal discovery on the 
correlation between oil/water partition coeffi- 
cients and the narcotic potencies of small or- 
ganic molecules (7, 8). Ferguson extended this 
analysis by placing the relationship between 
depressant action and hydrophobicity in a 
thermodynamic context; the relative satura- 
tion of the depressant in the biophase was a 
critical determinant of its narcotic potency (9). 
At this time, the success of the Hammett equa- 
tion began to permeate structure-activity 
studies and hydrophobicity as a determinant 
was relegated to the background. In a land- 
mark study, Hansch and his colleagues de- 



vised and used a multiparameter approach 
that included both electronic and hydrophobic 
terms, to establish a QSAR for a series of plant 
growth regulators (16). This study laid the ba- 
sis for the development of the QSAR paradigm 
and also firmly established the importance of 
lipophilicity in biosystems. Over the last 40 
years, no other parameter used in QSAR has 
generated more interest, excitement, and con- 
troversy than hydrophobicity (96). Hydropho- 
bic interactions are of critical importance in 
many areas of chemistry. These include en- 
zyme-ligand interactions, the assembly of lip- 
ids in biomembranes, aggregation of surfac- 
tants, coagulation, and detergency (97-100). 
The integrity of biomembranes and the ter- 
tiary structure of proteins in solution are de- 
termined by apolar-type interactions. 

Molecular recognition depends strongly on 
hydrophobic interactions between ligands and 
receptors. Excellent treatises on this subject 
have been written by Taylor (10 l)and Blokzijl 
and Engerts (51). Despite extensive usage of 
the term hydrophobic bond, it is well known 
that there is no strong attractive force be- 
tween apolar molecules (102). Frank and 
Evans were the first to apply a thermodynamic 
treatment to the solvation of apolar molecules 
in water at room temperature (103). Theif 
"iceberg" model suggested that a large en- 
tropic loss ensued after the dissolution of apo- 
lar compounds and the increased structure of 
water molecules in the surrounding apolar sol- 
ute. The quantitation of this model led to the 
development of the "flickering" cluster model 
of N^methy and Scheraga, which emphasized 
the formation of hydrogen bonds in liquid wa- 
ter (104). The classical model for hydrophobic 
interactions was delineated by Kauzmann to 
describe the van der Waals attractions be- 
tween the nonpolar parts of two molecules im- 
mersed in water. Given that van der Waals 
forces operate over short distances, the water 
molecules are squeezed out in the vicinity of 
the mutually bound apolar surfaces (49). The 
driving force for this behavior is not that al- 
kanes "hate" water, but rather water that 
"hates" alkanes (105, 106). Thus, the gain in 
entropy appears as the critical driving force 
for hydrophobic interactions that are primar- 
ily governed by the repulsion of hydrophobic 
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solutes from the solvent water and the limited 
but important capacity of water to maintain 
its network of hydrogen bonds. 

Hydrophobicities of solutes can readily be 
determined by measuring partition coeffi- 
cients designated as P. Partition coefficients 
deal with neutral species, whereas distribu- 
tion ratios incorporate concentrations of 
charged and/or polymeric species as well. By 
convention, Pis defined as the ratio of concen- 
tration of the solute in octanol to its concen- 
tration in water. 

P [ cone] QctauQj/[ cone] aqugQyg (1.41) 

It was fortuitous that octanol was chosen as 
the solvent most likely to mimic the biomem- 
brane. Extensive studies over the last 35 years 
(40,000experimental P- values in 400 different 
solvent systems) have failed to dislodge octa- 
nol from its secure perch (107,108). 

Octanol is a suitable solvent for the mea- 
surement of partition coefficients for many 
reasons (109, 1 10). It is cheap, relatively non- 
toxic, and chemically unreactive. The hy- 
droxyl group has both hydrogen bond acceptor 
and hydrogen bond donor features capable of 
interacting with a large variety of polar 
groups. Despite its hydrophobic attributes, it 
is able to dissolve many more organic com- 
pounds than can alkanes, cycloalkanes, or ar- 
omatic hydrocarbons. It is UV transparent 
over a large range and has a vapor pressure 
low enough to allow for reproducible measure- 
ments. It is also elevated enough to allow for 
its removal under mild conditions. In addition, 
water saturated with octanol contains only 
10“^ M octanol at equilibrium, whereas octa- 
nol saturated with water contains 2.3 M of 
water. Thus, polar groups need not be totally 
dehydrated in transfer from the aqueous 
phase to the organic phase. Likewise, hydro- 
phobic solutes are not appreciably solvated by 
the 10“^ M octanol in the water phase unless 
their intrinsic log P is above 6.0. Octanol be- 
gins to absorb light below 220 nm and thus 
solute concentration determinations can be 
monitored by UV spectroscopy. More impor- 
tant, octanol acts as an excellent mimic for 
biomembranes because it shares the traits of 



amphiphilicity and hydrogen-bonding capabil- 
ity with phospholipids and proteins found in 
biological membranes. 

The choice of the octanol/water partition- 
ing system as a standard reference for assess- 
ing the compartmental distribution of mole- 
cules of biological interest was recently 
investigated by molecular dynamics simula- 
tions (lll)It was determined that pure 1-oc- 
tanol contains a mix of hydrogen-bonded 
"polymeric" species, mostly four-, five-, and 
six-membered ring clusters at 40°C. These 
small ring clusters form a central hydroxyl 
core from which their corresponding alkyl 
chains radiate outward. On the other hand, 
water- saturated octanol tends to form well-de- 
fined, inverted, micellar aggregates. Long hy- 
drogen-bonded chains are absent and water 
molecules congregate around the octanol hy- 
droxyls . ' ' Hydrophilic channels "are formed by 
cylindrical formation of water and octanol hy- 
droxyls with the alkyl chains extending out- 
ward. Thus, water- saturated octanol has cen- 
tralized polar cores where polar solutes can 
localize. Hydrophobic solutes would migrate 
to the alkyl-rich regions. This is an elegant 
study that provides insight into the partition- 
ing of benzene and phenol by analyzing the 
structure of the octeinol/water solvation shell 
and delineating octanol’s capability to serve as 
a surrogate for biomembranes. 

The shake-flask method, so-called, is most 
commonly used to measure partition coeffi- 
cients with great accuracy and precision and 
with a log P range that extends from - 3 to +6 
(1 12, 1 13). The procedure calls for the use of 
pure, distilled, deionized water, high-purity 
octanol, and pure solutes. At least three con- 
centration levels of solute should be analyzed 
and the volumes of octanol and water should 
be varied according to a rough estimate of the 
logP value. Care should be exercised to ensure 
that the eventual amounts of the solute in 
each phase are about the same after equilib- 
rium. Standard concentration curves using 
three to four known concentrations in water 
saturated with octanol are usually estab- 
lished. Generally, most methods employ a UV- 
based procedure, although GC and HPLC may 
also be used to quantitate the concentration of 
the solute. 
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Generally, 10-mL stopped centrifuge tubes or 
200-niL centrifuge bottles are used They are in- 
verted gently for 2-3 min and then centrifuged at 
1000 - 2000 g for 20 min before the phases are an- 
alyzed. Analysis of both phases is highly recom- 
mended, to rrunirnize errors incurred by adsorp- 
tion to glass walls at low solute concentration. For 
highly hydrophobic compounds, the slow stirring 
procedure of de Bruijn and Hermens is recom- 
mended ( 1 14). The filler probe extractor system of 
Tomlinson et al. is a modified, automated, shake- 
flask method, which is efficient, fast, reliable, and 
flexible (115). 

Partition coefficients from different sol- 
vent systems can also be compared and con- 
verted to the octanol/water scale, as was sug- 
gested by Collander (116). He stressed the 
importance of the following linear relation- 
ship: logP 2 = ^ log Pi + b. This type of rela- 
tionship works well when the two solvents are 
both alkanols. However, when two solvent sys- 
tems have varying hydrogen bond donor and 
acceptor capabilities, the relationship tends to 
fray. A classical example involves the relation- 
ship between log P values in chloroform and 
octanol (117, 118). 

Log 

P CHCI 3 = 1.012 logP^^t- 0.513 (1.42) 
71 = 72, 7-2 = 0.811, s = 0.733 

Only 66 % of the variance in the data is ex- 
plained by this equation. However, a separation 
of the various solutes into OH bond donors, ac- 
ceptors, and neutrals helped account for 94% of 
the variance in the data. These restrictions led 
Seiler to extend the Collander equation by incor- 
porating a corrective term for H-bondingin the 
cyclohexane system (119). Fujita generalized 
this approach and formulated Equation 1.43 as 
shown below ( 120 ). 

log ?2 = a log Pi + X + C (1-43) 

Pi is the reference solvent and HB^ is an H- 
bonding parameter. Leahy et al. suggested that 
a more sophisticated approach incorporating 
four model systems would be needed to ade- 
quately address issues of solute partitioning in 
membranes (121). Thus, four distinct solvent 
types were chosen — apolar, amphiprotic, proton 



donor, and proton acceptor — and they were rep- 
resented by alkanes, octanol, chloroform, and 
propyleneglycol dipelargonate (PGDP), respec- 
tively. The demands of measuringfour partition 
coefficients for each solute has slowed progress 
in this particular area. 

3.2.1 Determination of Hydrophobicity by 
Chromatography. Chromatography provides 
an alternate tool for the estimation of hydro- 
phobicity parameters. values derived from 
thin-layer chromatography provide a simple, 
rapid, and easy way to ascertain approximate 
values of hydrophobicity (122, 123). 

72„, = log(l/i?^-l) (1.44) 

Other recent developments in chromatogra- 
phy techniques have led to the development 
of powerful tools to rapidly and accurately 
measure octanol/water partition coefficients. 
Countercurrent chromatography is one of 
these methods. The stationary and mobile 
phases include two nonmiscible solvents (wa- 
ter and octanol) and the total volume of the 
liquid stationary phase is used for solute par- 
titioning (124, 125). Log P^pp values of several 
diuretics including ionizable drugs have been 
measured at different pH values using coun- 
tercurrent chromatography; the log P values 
ranged from -1.3 to 2.7 and were consistknt 
with literature values (126). 

Recently, a rapid method for the determi- 
nation of partition coefficients using gradient 
reversed phase/high pressure liquid chroma- 
tography (RP-HPLC) was developed. This 
method is touted as a high-throughput hydro- 
phobicity screen for combinatorial libraries 
(127, 128). A chromatography hydrophobicity 
index (CHI) was established for a diverse set of 
compounds. Acetonitrile was used as the mod- 
ifier and 50 mm ammonium acetate as the mo- 
bile phase (127). A linear relationship was es- 
tablished between Clog P and CHIN for 
neutral molecules. 

ClogP = 0.057 CHIN - 1.107 (1.45) 
71 = 52, r^ = 0.724, s = 0.82, P = 131 

A more recent study using RP-HPLC for the 
determination of log P (octanol) values for 
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neutral and weakly acidic and basic drugs, 
revealed an excellent correlation between 
log Poet and log values (129). Log P^ot 
values determined in this system are re- 
ferred to as Elog Poet- They were expressed 
in terms of solvation parameters. 

Elog Poet = 0.204 + 0.452^2 

- l.OSSTTg” - 0.041 2 (1.46) 

-3.410 2 j32® + 3.842Vx 
m = 35, r" = 0.960, s = 0.244 

In this equation, R, is the excess molar re- 
fraction; 772^ is the dipolarity/polarizability; 
2 and 2 are the summation of hydro- 
gen bond acidity and basicity values, respec- 
tively; and Vx is McGowan’s volume. 

3.2.2 Calculation Methods. Partition coef- 
ficients are additive-constitutive, free energy- 
related properties. Log P represents the over- 
all hydrophobicity of a molecule, which 
includes the sum of the hydrophobic contribu- 
tions of the "parent" molecule and its sub- 
stituent. Thus, the tt value for a substituent 
may be defined as 

TTx = log Pr - X - log Pr - H (1.47) 

tth is set to zero. The ir-veilue for a nitro 
substituent is calculated from the log P of ni- 
trobenzene and benzene. 

■^N02 “ logP nitrobenzene log Pbenzene 

= 1.85 - 2.13 = -0.28 

An extensive list of x- values for aromatic 
substituents appears in Table 1.4. Pi values 
for side chains of amino acids in peptides have 
been well characterized and are easily avail- 
able (130-132). Aliphatic fragments values 
were developed a few years later. For a more 
extensive list of substituent value constants, 
refer to the extensive compilation by Hansch 
et al. (133). Initially, the x-system was applied 
only to substitution on aromatic rings and 
when the hydrogen being replaced was of in- 
nocuous character. It was apparent from the 



beginning that not all hydrogens on aromatic 
systems could be substituted without correc- 
tion factors because of strong electronic inter- 
actions. It became necessary to determine rr 
values in various electron-rich and -deficient 
systems (e.g., X-phenols and X-nitroben- 
zenes). Correction factors were introduced for 
special features such as unsaturation, branch- 
ing, and ring fusion. The proliferation of 
x-scales made it difficult to ascertain which 
system was more appropriate for usage, par- 
ticularly with complex structures. 

The shortcomings of this approach pro- 
vided the impetus for Nys and Rekker to de- 
sign the fragmental method, a "reductionist" 
approach, which was based on the statistical 
analysis of a large number of measured parti- 
tion coefficients and the subsequent assign- 
ment of appropriate values for particular mo- 
lecular fragments (118, 134). Hansch and Leo 
took a "constructionist" approach and devel- 
oped a fragmental system that included cor- 
rection factors for bonds and proximity effects 
(1, 135). Labor-intensive efforts and inconsis- 
tency in manual calculations were eliminated 
with the debut of the automated system 
CLOGP and its powerful SMILES notation 
(136-138). Recent analysis of the accuracy of 
CLOGP yielded Equation 1.48 (139). 

■ 

MLOGP = 0.959 CLOGP + 0.08 (1.48) 

71 = 12,107, = 0.973, s = 0.299 

The Clog P values of 228 structures (1.8% 
of the data set) were not well predicted. It 
must be noted that Starlist (most accurate val- 
ues in the database) contains almost 300 
charged nitrogen solutes (ammonium, pjxi- 
dinium, imidazolium, etc.) and over 2200 in 
all, which amounts to 5% of Masterfile (data- 
base of measured values). CLOGP adequately 
handles these molecules within the 0.30 stan- 
dard deviation limit. Most other programs 
make no attempt to calculate them. For more 
details on calculating log Poet from structures, 
see excellent reviews by Leo (140, 141). 

The proliferation of methodologies and 
programs to calculate partition coefficients 
continues unabated. These programs are 
based on substructure approaches or whole- 
molecule approaches (142, 143). Substructure 
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Table 1.4 Substituent Constants for QSAR Analysis 



No. 


Substituent 


Pi 


MR 


L 


B 1 


B5 


S-P 


S-M 


1 


+N(CHa)a 


-5.96 


1.94 


4.02 


2.57 


3.11 


0.82 


0.88 


2 


EtN(CH 3 ) 3 + 


-5.44 


2.87 


5.58 


1.52 


4.53 


0.13 


0.16 


3 


CH2N(CH3)3 + 


-4.57 


2.40 


4.83 


1.52 


4.08 


0.44 


0.40 


4 


C 02 - 


-4.36 


0.61 


3.53 


1.60 


2.66 


0.00 


- 0.10 


5 


+NH 3 


-4.19 


0.55 


2.78 


1.49 


1.97 


0.60 


0.86 


6 


PR-N(CH 3 ) 3 + 


-4.15 


3.33 


6.88 


1.52 


5.49 


- 0.01 


0.06 


7 


CH 2 NH 3 + 


-4.09 


1.01 


4.02 


1.52 


3.05 


0.29 


0.32 


8 


I 02 


-3.46 


6.35 


4.25 


2.15 


3.66 


0.78 


0.68 


9 


C(CN)3 


-2.33 


1.86 


3.99 


2.87 


4.12 


0.96 


0.97 


10 


NHNO, 


-2.27 


1.07 


4.50 


1.35 


3.66 


0.57 


0.91 


11 


C(N02>3 


- 2.01 


2.27 


4.59 


2.55 


3.72 


0.82 


0.72 


12 


SOaCNHa) 


-1.82 


1.23 


4.02 


2.04 


3.05 


0.60 


0.53 


13 


C(CN)=C(CN )2 


-1.77 


2.58 


6.46 


1.61 


5.17 


0.98 


0.77 


14 


CH 2 C= 0 (NH 2 ) 


- 1.68 


1.44 


4.58 


1.52 


4.37 


0.07 


0.06 


15 


NCCOCHa)^ 


- 1.68 


2.48 


4.45 


1.35 


4.33 


0.33 


0.35 


16 


SO 2 CH 3 


-1.63 


1.35 


4.11 


2.03 


3.17 


0.72 


0.60 


17 


P(0)(0H)2 


-1.59 


1.26 


4.22 


2.12 


2.88 


0.42 


0.36 


18 


S=0(CH3) 


-1.58 


1.37 


4.11 


1.40 


3.17 


0.49 


0.52 


19 


NCSO^CHs)^ 


-1.51 


3.12 


4.83 


1.36 


3.72 


0.49 


0.47 


20 


C=0(NH2) 


-1.49 


0.98 


4.06 


1.50 


3.07 


0.36 


0.28 


21 


CH(CN )2 


-1.45 


1.43 


3.99 


1.85 


4.12 


0.52 


0.53 


22 


CH 2 NHCOCH 3 


-1.43 


1.96 


5.67 


1.52 


4.75 


-0.05 


0.05 


23 


NHC=^(NH 2 ) 


-1.40 


2.22 


5.06 


1.35 


4.18 


0.16 


0.22 


24 


NH(OH) 


-1.34 


0.72 


3.87 


1.35 


2.63 


-0.34 


-0.04 


25 


CH=NNHC0NHNH2 


-1.32 


2.42 


7.57 


1.60 


4.55 


0.16 


0.22 


26 


NHC==0(NH2) 


-1.30 


1.37 


5.06 


1.35 


3.61 


-0.24 


-0.03 


27 


C=0(NHCH3) 


-1.27 


1.46 


5.00 


1.54 


3.16 


0.36 


0.35 


28 


2-Aziridinyl 


-1.23 


1.19 


4.14 


1.55 


3.24 


- 0.10 


-0.06 


29 


NH 2 


-1.23 


0.54 


2.78 


1.35 


1.97 


- 0.66 


-0.16 


30 


NHSO 2 CH 3 


-1.18 


1.82 


4.70 


1.35 


4.13 


0.03 


0.20 


31 


P(0)(0CH3)2 


-1.18 


2.19 


5.04 


2.42 


3.25 


0.53 


0.42 


32 


C(CH 3 )(CN )2 


-1.14 


1.90 


4.11 


2.81 


4.12 


0.57 


0.60 


33 


N(CH 3 )S 02 CH 3 


- 1.11 


2.34 


4.83 


1.35 


3.72 


0.24 


0.21 


34 


SOgEt 


- 1.10 


1.81 


4.92 


2.03 


3.49 


0.77 


0.66 


35 


CH 2 NH 2 


-1.04 


0.91 


4.02 


1.52 


3.05 


- 0.11 


-0.03 


36 


1-Tetrazolyl 


-1.04 


1.83 


5.28 


1.71 


3.12 


0.50 


0.52 


37 


CH 2 OH 


-1.03 


0.72 


3.97 


1.52 


2.70 


0.00 


0.00 


38 


N(CH 3 )C 0 CH 3 


- 1.02 


1.96 


4.77 


1.35 


3.71 


0.26 


0.31 


39 


NHCHO 


-0.98 


1.03 


4.22 


1.35 


3.61 


0.00 


0.19 


40 


NHC(=0)CH3 


-0.97 


1.49 


5.09 


1.35 


3.61 


0.00 


0.21 


41 


C(CH3)(N02)2 


- 0.88 


2.17 


4.59 


2.55 


3.72 


0.61 


0.54 


42 


NHNH, 


- 0.88 


0.84 


3.47 


1.35 


2.97 


-0.55 


- 0.02 


43 


OSO 2 CH 3 


- 0.88 


1.70 


4.66 


1.35 


4.10 


0.36 


0.39 


44 


S02N(CH3)2 


-0.78 


2.19 


4.83 


2.03 


4.08 


0.65 


0.51 


45 


NHC=S(NHC 2 H 5 ) 


-0.71 


3.17 


7.22 


1.45 


4.38 


0.07 


0.30 


46 


S 02 (CHF 2 ) 


- 0.68 


1.31 


4.11 


2.03 


3.70 


0.86 


0.75 


47 


OH 


-0.67 


0.29 


2.74 


1.35 


1.93 


-0.37 


0.12 


48 


CHO 


-0.65 


0.69 


3.53 


1.60 


2.36 


0.42 


0.35 


49 


CH 2 CHOHCH 3 


-0.64 


1.64 


4.92 


1.52 


3.78 


-0.17 


- 0.12 


50 


CS(NH 2 ) 


-0.64 


1.81 


4.10 


1.64 


3.18 


0.30 


0.25 


51 


0C=0(CH3) 


-0.64 


1.25 


4.74 


1.35 


3.67 


0.31 


0.39 


52 


SOCHF, 


-0.63 


1.33 


4.70 


1.40 


3.70 


0.58 


0.54 


53 


4-Pyrimidinyl 


-0.61 


2.18 


5.29 


1.71 


3.11 


0.63 


0.30 


54 


2-Pyrimidinyl 


-0.61 


2.18 


6.28 


1.71 


3.11 


0.53 


0.23 
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Table 1.4 (Continued) 



No. 


Substituent 


Pi 


MR 


L 


B1 


B5 


S-P 


S-M 


55 


P(CF3)2 


-0.59 


1.99 


4.96 


1.40 


3.86 


0.69 


0.60 


56 


CHgCN 


-0.57 


1.01 


3.99 


1.52 


4.12 


0.18 


0.16 


57 


CN 


-0.57 


0.63 


4.23 


1.60 


1.60 


0.66 


0.56 


58 


COCHg 


-0.55 


1.12 


4.06 


1.60 


3.13 


0.50 


0.38 


59 


CH2P=0(OEt)2 


-0.54 


3.58 


7.10 


1.52 


5.73 


0.06 


0.12 


60 


p=0(OEt)3 


-0.52 


3.12 


6.26 


2.52 


5.58 


0.60 


0.55 


61 


NHCOOMe 


-0.52 


1.57 


5.84 


1.45 


3.99 


-0.17 


-0.02 


62 


NHC=0(NHC2H5) 


-0.50 


2.32 


7.29 


1.45 


3.98 


-0.26 


0.04 


63 


NHC=0(CH2C1) 


-0.50 


1.98 


6.26 


1.55 


4.26 


-0.03 


0.17 


64 


NHCHg 


-0.47 


1.03 


3.53 


1.35 


3.08 


-0.70 


-0.21 


65 


N(CH3)C0CF3 


-0.46 


1.95 


5.20 


1.56 


3.96 


0.39 


0.41 


66 


0=S(NHCH3) 


-0.46 


2.23 


5.00 


1.88 


3.18 


0.34 


0.30 


67 


NHO^CCHg) 


-0.42 


2.34 


5.09 


1.45 


4.38 


0.12 


0.24 


68 


C(Et){N02)2 


-0.35 


3.66 


4.92 


2.55 


3.72 


0.64 


0.56 


69 


CO 2 H 


-0.32 


0.69 


3.91 


1.60 


2.66 


0.45 


0.37 


70 


C(0H)(CH3)2 


-0.32 


1.64 


4.11 


2.40 


3.17 


0.60 


0.47 


71 


EtCOaH 


-0.29 


1.65 


5.97 


1.52 


3.31 


-0.07 


-0.03 


72 


NO 2 


-0.28 


0.74 


3.44 


1.70 


2.44 


0.78 


0.71 


73 


CH=NNHCSNH 2 


-0.27 


2.96 


7.16 


1.60 


5.41 


0.40 


0.45 


74 


NHCN 


-0.26 


1.01 


3.90 


1.35 


4.05 


0.06 


0.21 


75 


CH2C(0H)(CH3)2 


-0.24 


2.11 


4.92 


1.52 


4.19 


-0.17 


-0.16 


76 


CH=CHCHO 


-0.23 


1.69 


5.76 


1.60 


3.46 


0.13 


0.24 


77 


NHCHaCOaEt 


-0.21 


2.69 


7.91 


1.35 


5.77 


-0.68 


-0.10 


78 


CH 2 OCH 3 


-0.21 


1.21 


4.78 


1.52 


3.40 


0.01 


0.08 


79 


NHO=OCH(CH3)2 


-0.18 


2.43 


5.53 


1.35 


4.09 


-0.10 


0.11 


80 


CHaOOOCCHa) 


-0.17 


1.65 


5.46 


1.52 


4.46 


0.05 


0.04 


81 


CH2N(CH3)2 


-0.15 


1.87 


4.83 


1.52 


4.08 


0.01 


0.00 


82 


CHaSCN 


-0.14 


1.81 


6.63 


1.52 


3.41 


0.14 


0.12 


83 


1-Aziridinyl 


-0.12 


1.35 


4.14 


1.35 


3.24 


-0.22 


-0.07 


84 


NO 


-0.12 


0.52 


3.44 


1.70 


2.44 


0.91 


0.62 


85 


ONO 2 


-0.12 


0.85 


4.46 


1.35 


3.62 


0.70 


0.55 


86 


S=0(C6H5) 


-0.07 


3.34 


4.62 


1.40 


6.02 


0.44 


0.50 


87 


CH 2 S 02 C 6 H 5 


-0.06 


3.79 


8.33 


1.52 


3.78 


0.16 


0.15 


88 


OCH 3 


-0.02 


0.79 


3.98 


1.35 


3.07 


-0.27 


0.12 


89 


C=0(0CH3) 


-0.01 


1.29 


4.73 


1.64 


3.36 


0.45 


0.36 


90 


H 


0.00 


0.10 


2.06 


1.00 


1.00 


0.00 


0.00 


91 


C^=0(CF3) 


0.02 


1.12 


4.65 


1.70 


3.67 


0.80 


0.63 


92 


CH=€(CN)2 


0.05 


1.97 


6.46 


1.60 


5.17 


0.84 


0.66 


93 


SOaCF) 


0.05 


0.87 


3.33 


2.01 


2.70 


0.91 


0.80 


94 


COEt 


0.06 


1.58 


4.87 


1.63 


3.45 


0.48 


0.38 


95 


C(CF3)3 


0.07 


2.08 


4.11 


3.13 


3.64 


0.55 


0.55 


96 


NH— Et 


0.08 


1.50 


4.83 


1.35 


3.42 


-0.61 


-0.24 


97 


NHC=0(CF3) 


0.08 


1.43 


5.62 


1.79 


3.61 


0.12 


0.30 


98 


SC=0(CH3) 


0.10 


1.84 


5.11 


1.70 


4.01 


0.44 


0.39 


99 


CF 3 


0.10 


0.50 


3.30 


1.99 


2.61 


0.54 


0.43 


100 


OCH 2 F 


0.10 


0.72 


4.57 


1.35 


3.07 


0.02 


0.20 


101 


CH=CHN02(TR) 


0.11 


1.64 


4.29 


1.60 


4.78 


0.26 


0.32 


102 


CH 2 F 


0.13 


0.54 


3.30 


1.52 


2.61 


0.11 


0.12 


103 


F 


0.14 


0.09 


2.65 


1.35 


1.35 


0.06 


0.34 


104 


C(OMe)3 


0.14 


2.48 


4.78 


2.56 


4.29 


-0.04 


-0.03 


105 


SECF3 


0.15 


1.63 


4.50 


1.85 


4.09 


0.45 


0.44 


106 


NHO=0(OEt) 


0.17 


2.12 


7.25 


1.35 


3.92 


-0.15 


0.11 


107 


CH 2 CI 


0.17 


1.05 


3.89 


1.52 


3.46 


0.12 


0.11 


108 


N(CH3)2 


0.18 


1.56 


3.53 


1.35 


3.08 


-0.83 


-0.16 
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Table 1.4 (Continued) 



No. 


Substituent 


Pi 


MR 


L 


B1 


B5 


S-P 


S-M 


109 


CHFa 


0.21 


0.52 


3.30 


1.71 


2.61 


0.32 


0.29 


no 


CCCFg 


0.22 


1.41 


5.90 


1.99 


2.61 


0.51 


0.41 


111 


SO^CsHg 


0.27 


3.32 


5.86 


2.03 


6.02 


0.68 


0.62 


112 


C 0 CH(CH 3)2 


0.29 


1.98 


4.84 


1.99 


4.08 


0.47 


0.38 


113 


OCHF, 


0.31 


0.79 


3.98 


1.35 


3.61 


0.18 


0.31 


114 


CH2SO2CF3 


0.33 


1.75 


5.35 


1.52 


4.07 


0.31 


0.29 


115 


CCNO^XCHa)^ 


0.33 


2.06 


4.59 


2.58 


3.72 


0.20 


0.18 


116 


P( 0 )( 0 PR )2 


0.35 


4.05 


7.07 


2.52 


6.90 


0.50 


0.38 


117 


CH2S=0(CF3) 


0.37 


1.90 


5.35 


1.52 


4.07 


0.24 


0.25 


118 


OCH2CH3 


0.38 


1.25 


4.80 


1.35 


3.36 


-0.24 


0.10 


119 


SH 


0.39 


0.92 


3.47 


1.70 


2.33 


0.15 


0.25 


120 


N^NCFg 


0.40 


1.39 


5.45 


1.70 


3.48 


0.68 


0.56 


121 


CCH 


0.40 


0.96 


4.66 


1.60 


1.60 


0.23 


0.21 


122 


N=CCl2 


0.41 


1.84 


5.65 


1.70 


4.54 


0.13 


0.21 


123 


SCCH 


0.41 


1.62 


4.08 


1.70 


4.85 


0.19 


0.26 


124 


SCN 


0.41 


1.34 


4.08 


1.70 


4.45 


0.52 


0.51 


125 


P(CH 3)2 


0.44 


2.12 


3.88 


2.00 


3.32 


0.06 


0.03 


126 


NHS02C6H5 


0.45 


3.79 


8.24 


1.35 


3.72 


0.01 


0.16 


127 


S02NHCSH5 


0.45 


3.78 


8.24 


2.03 


4.50 


0.65 


0.56 


128 


CH2CF3 


0.45 


0.97 


4.70 


1.52 


3.70 


0.09 


0.12 


129 


NNN 


0.46 


1.02 


4.62 


1.50 


4.18 


0.08 


0.37 


130 


NNN 


0.46 


1.02 


4.62 


1.50 


4.18 


0.08 


0.37 


131 


4 -Pyridyl 


0.46 


2.30 


5.92 


1.71 


3.11 


0.44 


0.27 


132 


N=NN(CH3)2 


0.46 


2.09 


5.68 


1.77 


3.90 


0.44 


0.27 


133 


(XXNHCeHs) 


0.49 


3.54 


8.24 


1.63 


4.85 


-0.03 


-0.05 


134 


2-Pyridyl 


0.50 


2.30 


6.28 


1.71 


3.11 


0.41 


0.23 


135 


OCHaCH^CHg 


0.51 


1.61 


6.22 


1.35 


4.42 


0.17 


0.33 


136 


O=0(0Et) 


0.51 


1.75 


5.95 


1.64 


4.41 


-0.25 


0.09 


137 


S= 0 (CF 3 ) 


0.53 


1.31 


4.70 


1.40 


3.70 


0.45 


0.37 


138 


CHOHCqHs 


0.54 


3.15 


4.62 


1,73 


6.02 


0.69 


0.63 


139 


OCH2C1 


0.54 


1.20 


5.44 


1.35 


3.13 


-0.03 


0.00 


140 


S 02 (CF 3 ) 


0.55 


1.29 


4.70 


2.03 


3.70 


0.08 


0.25 


141 


CH3 


0.56 


0.57 


2.87 


1.52 


2.04 


0.96 


0.83 


142 


SCH3 


0.61 


1.38 


4.30 


1.70 


3.26 


-0.17 


-0.07 


143 


SC= 0 (CF 3 ) 


0.66 


1.82 


5.55 


1.70 


4.51 


0.00 


0.15 


144 


C 0 C(CH 3)3 


0.69 


2.44 


4.87 


1.87 


4.42 


0.46 


0.48 


145 


CH=NC6H6 


0.69 


3.30 


8.50 


1.70 


4.07 


0.32 


0.27 


146 


P= 0 (CeH 5)2 


0.70 


5.93 


5.40 


2.68 


6.19 


0.42 


0.35 


147 


Cl 


0.71 


0.60 


3.52 


1.80 


1.80 


.530 


.380 


148 


N=CHC6H5 


0.72 


3.30 


8.40 


1.70 


4.65 


0.23 


0.37 


149 


SeCHs 


0.74 


1.70 


4.52 


1.85 


3.63 


-0.55 


-0.08 


150 


SCHgF 


0.74 


1.34 


4.89 


1.70 


3.41 


0.00 




151 


0CH=CH2 


0.75 


1.14 


4.98 


1.35 


3.65 


0.20 


0.23 


152 


CHsBr 


0.79 


1.34 


4.09 


1.52 


3.75 


-0.09 


0.21 


153 


CCCH3 


0.81 


1.41 


5.47 


1.60 


2.04 


0.14 


0.12 


154 


CH=CH 2 


0.82 


1.10 


4.29 


1.60 


3.09 


0.03 


0.21 


155 


Br 


0.86 


0.89 


3.82 


1.95 


1.95 


-0.16 


-0.08 


156 


NHSO2CF3 


0.93 


1.75 


5.26 


1.35 


4.00 


0.23 


0.39 


157 


OSO2C6H5 


0.93 


3.67 


8.20 


1.35 


3.64 


0.39 


0.44 


158 


1 -Pyrryl 


0.95 


1.95 


5.44 


1.71 


3.12 


0.33 


0.36 


159 


N(CH3)S02CF3 


1.00 


2.28 


5.26 


1.54 


4.00 


0.37 


0.47 


160 


SCHF2 


1.02 


1.38 


4.30 


1.70 


3.94 


0.44 


0.46 


161 


CH2CH3 


1.02 


1.03 


4.11 


1.52 


3.17 


0.37 


0.33 


162 


OCF3 


1.04 


0.79 


4.57 


1.35 


3.61 


-0.15 
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Table 1.4 ( Continued) 



Substituent 



OCH 2 CH 2 CH 
C=0(C6H5) 
NHC 02 C 4 H 9 
S— Et 

N(CF3)2 

CHC 12 

CH2CH=CH. 
CH 2 I 
NH— Bu 



Cyclopropyl 

C(CH3)=CH2 

NCS 

SCH 2 CH=CH 



C(0H)(CF3)2 

SCH=CH2 

NHCgHg 

SCH(CH3)2 

SCFg 

0C==0(C6H5) 

COOCgHg 

Cyclobutyl 



CHtCHO 



C(F)(CF 3)2 

C6H4(N02)-P 

CH 20 C 6 H 5 

N=NCeHs 

SO 2 CF 2 CF 3 

CF2CF2CF2CF3 

1-Cyclopentenyl 

OCF 2 CHF 2 

C6H4(OCH3)-p 



CH2Si(CH3)3 

CHaCgHg 

CH(CH 3 )(Et) 

C 6 H 4 F-P 

OCgHn 

N(C3H7)2 

OCeHs 

C6H4N(CH3)2-P 



1.05 


1.71 


1.05 


3.03 


1.07 


3.05 


1.07 


1.84 


1.08 


1.43 


1.09 


1.53 


1.10 


1.45 


1.10 


1.86 


1.10 


2.43 


1.11 


1.07 


1.12 


1.39 


1.14 


1.35 


1.14 


1.56 



0.20 


— 


0.46 




0.21 


— 


0.09 


— 


0.12 


— 


0.06 




0.34 




0.93 


— 


0.03 




0.56 


— 
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Table 1.4 (Continued) 



No. 


Substituent 


Pi 


MR 


L 


B1 


B5 


S-P 


S-M 


217 


Cyclopentyl 


2.14 


2.20 


4.90 


1.90 


4.09 


0.28 


0.35 


218 


CHI, 


2.15 


3.15 


4.36 


1.95 


4.15 


-0.14 


-0.05 


219 


SCfiH, 


2.32 


3.43 


4.57 


1.70 


6.42 


0.26 


0.26 


220 


1-Cyclohexenyl 


2.33 


2.67 


6.16 


2.23 


3.30 


0.07 


0.23 


221 


OCCI 3 


2.36 


2.18 


5.44 


1.35 


4.41 


-0.08 


- 0.10 


222 


C(Et)(CH 3)2 


2.37 


2.42 


4.92 


2.60 


3.49 


0.35 


0.43 


223 


CH^CtCHais 


2.37 


2.42 


4.89 


1.52 


4.18 


-0.18 


-0.06 


224 


SCsH^NOg-p 


2.39 


4.11 


4.92 


1.70 


7.86 


-0.17 


-0.05 


225 


SCF 2 CHF 2 


2.43 


1.84 


5.60 


1.70 


4.55 


0.24 


0.32 


226 


C 6 H 4 CI-P 


2.61 


3.04 


7.74 


1.80 


3.11 


-0.07 


-0.04 


227 


CeP, 


2.62 


2.40 


6.87 


1.71 


3.67 


0.12 


0.15 


228 




2.63 


2.42 


6.97 


1.52 


4.94 


0.27 


0.26 


229 


CCCeHg 


2.65 


3.32 


8.88 


1.71 


3.11 


-0.15 


-0.08 


230 


CBra 


2.65 


2.88 


4.09 


2.86 


3.75 


0.16 


0.14 


231 


EtCgHg 


2.66 


3.47 


8.33 


1.52 


3.58 


0.29 


0.28 


232 


C6H4(CH3)-P 


2.69 


3.00 


7.09 


1.84 


3.11 


- 0.12 


-0.07 


233 


C 6 H 4 I-P 


3.02 


3.91 


8.45 


2.15 


3.11 


-0.03 


0.06 


234 


C 6 H 4 l-m 


3.02 


3.91 


6.72 


1.84 


5.15 


0.12 


0.15 


235 


1-Adamantyl 


3.37 


4.03 


6.17 


3.16 


3.49 


-0.15 


-0.05 


236 


C(Et )3 


3.42 


3.36 


4.92 


2.94 


4.18 


0.10 


0.14 


237 


CH(C«Hg)2 


3.52 


5.43 


5.15 


2.01 


6.02 


0.06 


0.13 


238 


N(CsHg )2 


3.61 


5.50 


5.77 


1.35 


5.95 


0.01 


0.08 


239 


Heptyl 


3.69 


3.36 


9.03 


1.52 


6.39 


-0.13 


- 0.12 


240 


CtSCF^), 


4.17 


4.40 


5.82 


3.32 


5.00 


- 0.20 


-0.07 


241 


CfiClg 


4.96 


4.95 


7.74 


1.81 


4.48 


-0.05 


-0.03 



methods are based on molecular fragments, 
atomic contributions, or computer-identified 
fragments (1, 106, 107, 144-147). Whole-mol- 
ecule approaches use molecular properties or 
spatial properties to predict log P values (148- 
150). They run on different platforms (e.g., 
Mac, PC, Unix, VAX, etc.) and use different 
calculation procedures. An extensive, recent 
review by Mannhold and van de Waterbeemd 
addresses the advantages and limitations of 
the various approaches (143). Statistical pa- 
rameters yield some insight as to the effective- 
ness of such programs. 

Recent attempts to compute log P calcula- 
tions have resulted in the development of sol- 
vatochromic parameters (151, 152). This ap- 
proach was proposed by Kamlet et al. and 
focused on molecular properties. In its sim- 
plest form it can be expressed as follows: 

Log Poet = + biT* + cj3h + dan + e (1.49) 

V is a solute volume term; x* represents 
the solute polarizability; j3h and a, are mea- 
sures of hydrogen bond acceptor strength and 



hydrogen bond donor strength, respectively; 
and e is the intercept. An extension of this 
model has been formulated by Abraham and 
used by researchers to refine molecular de- 
scriptors and characterize hydrophobicity 
scales (153-156). 

3.3 Steric Parameters 

The quantitation of steric effects is complex at 
best and challenging in all other situations, 
particularly at the molecular level. An added 
level of confusion comes into play when at- 
tempts are made to delineate size and shape. 
Nevertheless, sterics are of overwhelming im- 
portance in ligand-receptor interactions as 
well as in transport phenomena in cellular sys- 
tems. The first steric parameter to be quanti- 
fied and used in QSAR studies was Taft’s Eq 
constant (157). Eq is defined as 

^s~ (1.50) 

where k-^ and represent the rates of acid 
hydrolysis of esters, XCH2COOR and CH3COOR, 
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respectively. To correct for hyperconjuga- 
tion in the a-hydrogens of the acetate moi- 
ety, Hancock devised a correction on Eq such 
that 

= Es + 0.306(n - 3) (1.51) 

In Equation 1.51, n represents the num- 
ber of a-hydrogens and 0.306 is a constant 
derived from molecular orbital calculations 
(158). Unfortunately, the limited availabil- 
ity of Eg and values for a great number 
of substituents precludes their usage in 
QSAR studies. Charton demonstrated a 
strong correlation between Eg and van der 
Waals radii, which led to his development of 
the upsilon parameter % (159). 

1.20 (1.52) 

where rx and are the minimum van der 
Waals radii of the substituent and hydrogen, 
respectively. Extension of this approach 
from symmetrical substituents to nonsym- 
metrical substituents must be handled with 
caution. 

One of the most widely used steric param- 
eters is molar refraction (MR), which has 
been aptly described as a "chameleon" pa- 
rameter by Tute (160). Although it is gener- 
ally considered to be a crude measure of 
overall bulk, it does incorporate a polariz- 
ability component that may describe cohe- 
sion and is related to Eondon dispersion 
forces as follows: MR = 47rNa/3, where N is 
Avogadro's number and a is the polarizabil- 
ity of the molecule. It contains no informa- 
tion on shape. MR is also defined by the 
Eorentz-Eorenz equation: 

MR = [(«2 - i)/(n2 + 2)] 

X (MW/density) (1.53) 

MR is generally scaled by 0. 1 and used in bio- 
logical QSAR, where intermolecular effects 
are of primary importance. The refractive in- 
dex of the molecule is represented by n. With 
alkyl substituents, there is a high degree of 
collinearity with hydrophobicity; hence, care 



must be taken in the QSAR analysis of such 
derivatives. The MR descriptor does not dis- 
tinguish shape; thus the MR value for amyl 
(— CH 2 CH 2 CH 2 CH 2 CH 3 ) is the same as that 
for [ — C(Et)(CHg) 2 ]: 2.42. The coefficients 
with MR terms challenge interpretation, al- 
though extensive experience with this param- 
eter suggests that a negative coefficient im- 
plies steric hindrance at that site and a 
positive coefficient attests to either dipolar in- 
teractions in that vicinity or anchoring of a 
ligand in an opportune position for interaction 
(161). 

The failure of the MR descriptor to ade- 
quately address three-dimensional shape is- 
sues led to Verloop's development of STERI- 
MOE parameters (162), which define the 
steric constraints of a given substituent along 
several fixed axes. Eive parameters were 
deemed necessary to define shape: E, Bl, B2, 
B3, and B4. E represents the length of a sub- 
stituent along the axis of a bond between the 
parent molecule and the substituent; Bl to B4 
represent four different width parameters. 
However, the high degree of collinearity be- 
tween Bl, B2, and B3 and the large number of 
training set members needed to establish the 
statistical validity of this group of parameters 
led to their demise in QSAR studies. Verloop 
subsequently established the adequacy of just 
three parameters for QSAR analysis: a slightly 
modified length E, a minimum width Bl, and a 
maximum width B5 that is orthogonal to E 

(163) . The use of these insightful parameters 
have done much to enhance correlations with 
biological activities. Recent analysis in our 
laboratory has established that in many cases, 
Bl alone is superior to Taft's Eg and a combi- 
nation of Bl and B5 can adequately replace Eg 

(164) . 

Molecular weight (MW) terms have also 
been used as descriptors, particularly in cellu- 
lar systems, or in distribution/transport stud- 
ies where diffusion is the mode of operation. 
According to the Einstein-Sutherland equa- 
tion, molecular weight affects the diffusion 
rate. The Eog MW term has been used exten- 
sively in some studies (159- 161)and an exam- 
ple of such usage is given below. In correlating 
permeability (Perm) of noneledrolytes through 
chara cells, Eien et al. obtained the following 
QSAR (168): 




3 Parameters Used in QSAR 
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Leg Perm 

= 0.889 logP* - 1.544 log MW (1.54) 
- 0.144^6 + 4.653 

n = 30, H = 0.899, 
s = 0.322, F= 77.39 

In QSAR 54, Log P* represents the olive oil/ 
water partition coefficient, MW is the molec- 
ular weight of the solute and defines its size, 
and Hi, is a crude approximation of the total 
number of hydrogen bonds for each mole- 
cule. The molecular weight descriptor has 
also been an omnipresent variable in QSAR 
studies pertaining to cross-resistance of var- 
ious drugs in multidrug-resistant cell lines 
(169). ^MW was used because it most 
closely approximates the size (radii) of the 
drugs involved in the study and their inter- 
actions with GP-170. See QSAR 1.55. 

Log CR= 0.70^^ 

- 1.01 log(j3*10^-h 1) 

(1.55) 

- 0.10 log P + 0.381 
-- 3.08 

n = 40, = 0.794, s = 0.344 

log (3 = -6.851 optimum ^jMW =7.21 

3.4 Other Variables and Variable Selection 

Indicator variables (/) are often used to high- 
light a structural feature present in some of 
the molecules in a data set that confers un- 
usual activity or lack of it to these particular 
members. Their use could be beneficial in 
cases where the data set is heterogeneous and 
includes large numbers of members with un- 
usual features that may or may not impact a 
biological response. QSAR for the inhibition of 
trypsin by X-benzamidines used indicator 
variables to denote the presence of unusual 
features such as positional isomers and vinyl/ 
carbonyl-containing substituents (170). A re- 
cent study on the inhibition of lipoxygenase 
catalyzed production of leukotriene B4 and 
5 -hydroxy eicosatetraenoic from arachidonic 



acid in guinea pig leukocytes by X-vinyl cat- 
echols led to the development of the following 
QSAR (171): 

Log 1/C 

= 0.49(±0.11)logP 

- 0.75(±0.22)log(/3 • + 1) (1.56) 

- 0.62(±0.18)D2 

- 1.13(±0.20)D3 + 5.50(±0.33) 

n = 51, r" = 0.801, s = 0.269, 

LogPo = 4.61(±0.49) LogjS=-4.33 

The indicator variables are D2 and D3; for 
simple X-catechols,D2 = 1 andforX-naphtha- 
lene diols, D3 = 1. The negative coefficients 
with both terms (D2 and D3) underscore the 
detrimental effects of these structural fea- 
tures in these inhibitors. Thus, discontinuities 
in the structural features of the molecules of 
this data set are accounted for by the use of 
indicator variables. An indicator variable may 
be visualized graphically as a constant that 
adjusts two parallel lines so that they are su- 
perimposable. The use of indicator variables 
in QSAR analysis is also described in the fol- 
lowing example. An analysis of a comprehen- . 
sive set of nitroaromatic and heteroaromatic 
compounds that induced mutagenesis in TA98 
cells was conducted by Debnath et al., and 
QSAR 1.57 was formulated (172). 

LogTA98 

= 0.65(±0.16)logP 

- 2.90(±0.59)log(j8 • 10^°^ + 1) 

- 1.38(±0.25).&LU]vto 

+ 1.88(±0.39)/i - 2.89(±0.81)4 

- 4.15(±0.58) 

n = 188, = 0.810, s = 0.886, 

LogPo = 4.93(±0.35) Log j8= -5.48 

TA98 represents the number of revertants per 
nanomole of nitro compound. £^lumo is the 
energy of the lowest unoccupied molecular or- 
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bital and is an indicator variable that signi- 
fies the presence of an acenthrylene ring in the 
mutagens. I, is also an indicator variable that 
pertains to the number of fused rings in the 
data set. It acquires a value of 1 for all conge- 
ners containing three or more fused rings and 
a value of zero for those containing one or two 
fused rings (e.g., naphthalene, benzene). 
Thus, the greater the number of fused rings, 
the greater the mutagenicity of the nitro con- 
geners. The £?lumo term indicates that the 
lower the energy of the LUMO, the more po- 
tent the mutagen. In this QSAR the combina- 
tion of indicator variables affords a mixed 
blessing. One variable helps to enhance activ- 
ity, whereas the other leads to a decrease in 
mutagenicity of the acenthrylene congeners. 
In both these QSAR, Kubinyi's bilinear model 
is used (2 l).See Section 4.2 for a description of 
this approach. 

3.5 Molecular Structure Descriptors 

These are truly structural descriptors because 
they are based only on the two-dimensional 
representation of a chemical structure. The 
most widely known descriptors are those that 
were originally proposed by Randic (173) and 
extensively developed by Kier and Hall (27). 
The strength of this approach is that the re- 
quired information is embedded in the hydro- 
gen-suppressed framework and thus no exper- 
imental measurements are needed to define 
molecular connectivity indices. For each bond 
the Ck term is calculated. The summation of 
these terms then leads to the derivation of X, 
the molecular connectivity index for the mol- 
ecule. 

Ck = (SjS^)"”^ where d = cr - h (1.58) 

S is the count of formally bonded carbons and 
h is the number of bonds to hydrogen atoms. 

'X = y Ci = 2 (1.59) 

is the first bond order because it considers 
only individual bonds. Higher molecular con- 
nectivity indices encode more complex at- 
tributes of molecular structure by considering 
longer paths. Thus, and ^X account for all 
two-bond paths and three-bond paths, respec- 



tively, in a molecule. To correct for differences 
in valence, Kier and Hall proposed a valence 
delta (S'") term to calculate valence connectiv- 
ity indices (175). 

Molecular connectivity indices have been 
shown to be closely related to many physico- 
chemical parameters such as boiling points, 
molar refraction, polarizability, and partition 
coefficients (174, 176). Ten years ago, the E- 
State index was developed to define an atom- 
or group-centered numerical code to represent 
molecular structure (28). The E-State was es- 
tablished as a composite index encoding both 
electronic and steric properties of atoms in 
molecules.lt reflects an atom's electronegativ- 
ity, the electronegativity of proximal and dis- 
tal atoms, and topological state. Extensions of 
this method include the HE-State, atom-type 
E-State, and the polarity index Q. Log P 
showed a strong correlation with the Q index 
of a small set (n = 21) of miscellaneous com- 
pounds (28). Various models using electroto- 
pological indices have been developed to delin- 
eate a variety of biological responses 
(177-179). Some criticism has been leveled at 
this approach (180, 181). Chance correlations 
are always a problem when dealing with such 
a wide array of descriptors. The physico- 
chemical interpretation of the meaning of 
these descriptors is not transparent, although 
attempts have been made to address this 
issue (27). 

4 QUANTITATIVE MODELS 
4.1 Linear Models 

The correlation of biological activity with 
physicochemical properties is often termed an 
extrathermodynamic relationship. Because it 
follows in the line of Hammett and Taft equa- 
tions that correlate thermodynamic and re- 
lated parameters, it is appropriately labeled. 
The Hammett equation represents relation- 
ships between the logarithms of rate or equi- 
librium constants and substituent constants. 
The linearity of many of these relationships 
led to their designation as linear free energy 
relationships. The Hansch approach repre- 
sents an extension of the Hammett equation 
from physical organic systems to a biological 
milieu. It should be noted that the simplicity 
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cf the approach belies the tremendous com- 
plexity of the intermolecular interactions at 
play in the overall biological response. 

Biological systems are a complex mix of het- 
erogeneous phases. Drug molecules usually tra- 
verse many of these phases to get from the site of 
administration to the eventual site of action. 
Along this random-walk process, they perturb 
many other cellular components such as or- 
ganelles, hpids, proteins, and so forth. These in- 
teractions are complex and vastly different from 
organic reactions in test tubes, even though the 
eventual interaction with a receptor may be 
chemical or physicochemical in nature. Thus, 
depending on the biological system involved — 
isolated receptor, cell, or whole animal — one ex- 
pects the response to be multifactorial and com- 
plex. The overall process, particularly in vitro or 
in uiao, studies a mix cf equilibrium and rate 
processes, a situation that defies easy separation 
and dehneation. 

Meyer and Overton were the first to attempt 
to get a grasp on biological responses by noting 
the relationship between oil/water partition co- 
efficients and their narcotic activity. Ferguson 
recognized that equitoxic concentrations of 
small organic molecules was markedly influ- 
enced by their phase distribution between the 
biophase and exobiophase. This concept was 
generalized in the form of Equation 1.60 and 
extended by Pujita to Equation 1 .6 1 ( 1 82, 183). 



C = kA^ 



(1.60) 



Log 1/C = m Log(l/A) + constant (1.61) 

C represents the equipotent concentration, k 
and m are constants for a particular system, 
and A is a physicochemical constant represen- 
tative of phase distribution equilibria such as 
aqueous solubility, oil/water partition coeffi- 
cient, and vapor pressure. In examining a 
large and diverse number of biological systems, 
Hansch and coworkers defined a relationship 
(Equation 1 .62) that expressed biological ac- 
tivity as a function of physicochemical param- 
eters (e.g., partition coefficients of organic 
molecules) (19). 



the mode of interactions of chemicals with bi- 
ological entities. Examples of linear models 
pertaining to nonspecific toxicity are de- 
scribed. The effects of a series of alcohols 
(ROH) have been routinely studied in many 
model and biological systems. See QSAR 1.63- 
1.67. 

4.1.1 Penetration of ROH into Phosphati- 
dyichoiine Monoiayers (184) 



Log 1/C = 0.87(±0.01)logP 

+ 0 . 66 (± 0 . 01 ) 

71 = 4, 7'^ = 0.998, 5 = 0.002 



(1.63) 



4.1.2 Changes in Signai of Labeied 
Ghost Membranes by ROH (185) 



Log 1/C = 0.93(±0.09)logP 
- 0.41(±0.16) 

71 = 6, = 0.996, s = 0.092 



(1.64) 



4.1.3 Induction of Narcosis in Rabbits by 
ROH (184) 



Log 1/C = 0.72(±0.16)logP 

( 

+ 1.35(±0.12) 

n = 11, = 0.924, 5 = 0.142 



(1.65) 



4.1.4 inhibition of Bacteriai Luminescence 
by ROH (185) 



Log 1/C = 1.10(±0.07)logP 
+ 0.16(±0.12) 

77 = 8, r^ = 0.996, s = 0.103 



( 1 . 66 ) 



4.1.5 inhibition of Growth of Tetrahymena 
pyriformisby ROH (76, 186) 



Log 1/C = 0.82(±0.04)ClogP 
+ 0.89(±0.10) 



(1.67) 



Log 1/C = a log P + 6 



77 = 34, r^ = 0.982, s = 0.173 



Modd systems have been devised to elucidate In all cases, there is a strong dependency on 
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Octanol 

p 

‘^octanol 




Wata* phase 





Bio phase 

^bio 

Aqueous phase 



Figure 1.1. Log Po^noi mirrors Lc^ Pbio- 



log Poet because all these processes involve 
transport of alcohols through membranes. 
The low intercepts speak to the nonspecific 
nature of the alcohol-mediated toxic interac- 
tion. An equilibrium-pseudoequilibriummod- 
eled by log P can be defined as shown in Fig. 
1 . 1 . 

The Hammett-type relationship for this 
conceptual idea of distribution is 

Log Pbio = a . log ■^octanol "f b (1.68) 

This postulate assumes that steric, hydropho- 
bic, electronic, and hydrogen bonding factors 
that affect partitioning in the biophase are 
handled by the octanol/water system. Given 
that the biological response (log 1/C) is propor- 
tional to log Pbio, then it follows that 

Log 1/C = a . log Poctanoi + constant (1.69) 

Hansch and coworkers have amply demon- 
strated that Equation 1.69 applies not only to 
systems at or near phase distribution equilib- 
rium but also to systems removed from equi- 
librium (184, 185). 



Log 1/C = -a (log P)^ + b • log P 

+ constant (L70) 

In the random-walk process, the compounds 
partition in and out of various compartments 
and interact with myriad biological compo- 
nents in the process. To deal with this conun- 
drum, Hansch proposed a general, compre- 
hensive equation for QSAR 1 .7 1 (188). 

Log 1/C = -a(logP)^ -(- b -logP 
+ ptr + oAg + constant 

The optimum value of log P for a given system 
is log Pq and it is highly influenced by the 
number of hydrophobic barriers a drug en- 
counters in its walk to its site of action. 
Hansch and Clayton formulated the following 
parabolic model to elucidate the narcotic ac- 
tion of alcohols on tadpoles (189). 

4.2.1 Narcotic Action of ROH on Tadpoies 

Logl/C = 1.38(±0.34)logP 

- 0.08(±0.07)(LogP)2 (1.72) 

-I- 0.52(±0.34) 

n = 10, = 0.990, s = 0.210, 

LogPo = 8.69(5.78 - 43.43) 



4.2 Noniinear Modeis 

Extensive studies on development of linear 
models led Hansch and coworkers to note that 
a breakdown in the linear relationship oc- 
curred when a greater range in hydrophobic- 
ity was assessed with particular emphasis 
placed on test molecules at extreme ends of the 
hydrophobicity range. Thus, Hansch et al. 
suggested that the compounds could be in- 
volved in a "random-walk" process: low hydro- 
phobic molecules had a tendency to remain in 
the first aqueous compartment, whereas 
highly hydrophobic analogs sequestered in the 
first lipoidal phase that they encountered. 
This led to the formulation of a parabolic 
equation, relating biological activity and hy- 
drophobicity (187). 



This is an example of nonspecific toxicity 
where the last step probably involves parti- 
tioning into a hydrophobic membrane. LogPo 
represents the optimal hydrophobicity (as de- 
fined by log P) that elicits a maximal biological 
response. 

Despite the success of the parabolic equa- 
tion, there are a number of worrisome limita- 
tions. This approach forces the data into a 
symmetrical parabola, with the result that 
there are usually deviations between the ex- 
perimental and parabola-calculateddata. Sec- 
ond, the ascending slope is curved and incon- 
sistent with the observed linear data. Thus, 
the slope of a linear model cannot be compared 
to the curved slope of the parabola. In 1973 
Lranke devised a sophisticated, empirical 
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model consisting of a linear ascending part 
and a parabolic part (190). See Equations 1.73 
and 1 .74. 



a • log P + c 


(1.73) 


(if log P < log Px) 




-a(log P)^ + b . log P + c 


(1.74) 


(iflogP> logPx) 





The binding of drugs to proteins is linearly 
dependent on hydrophobicity up to a limited 
value, log after which steric hindrance 
causes the linear dependency to alter to a non- 
linear one. The major limitation of this ap- 
proach involves the inclusion of highly hydro- 
phobic congeners that tend to cause 
systematic deviations between experimental 
and predicted values. 

Another cutoff model, which deals with 
nonlinearity in biological systems, is one de- 
fined by McFarland (191). It attempts to elu- 
cidate the dependency of drug transport on 
hydrophobicity in multicompartment models. 
McFarland addressed the probability of drug 
molecules traversing several aqueous lipid 
barriers from the first aqueous compartment 
to a distant, final aqueous compartment. The 
probability Pq „^ of a drug molecule to access 
the final compartment n of a biological system 
was used to define the drug concentration in 
this compartment. 

Log Cr = a • log P - 2a' log(P + 1) 

+ constant (1-^5) 

The ascending and descending slopes are 
equal ( = l)and linear. However, a major draw- 
back of this model is that it forces the activity 
curves to maximize at log P = 0. These studies 
were extended by Kubinyi, who developed the 
elegant and powerful bilinear model, which is 
superior to the parabolic model and is exten- 
sively used in QSAR studies (192). 



ganic phase and the aqueous phase. An impor- 
tant feature of this model lies in the symmetry 
of the curves. For aqueous phases of this 
model system, symmetrical curves with linear 
ascending and descending sides (like a teepee) 
and a limited parabolic section around the hy- 
drophobicity optimum are generated. Unsym- 
metrical curves arise for the lipid phases. It is 
highly compatible with the linear model and 
allows for quick comparisons of the ascending 
slopes. It can also be used with other parame- 
ters such as MR and cr, where it appears to 
pinpoint a change in mechanism similar to the 
breaks in linearity of the Hammett equation. 
The following example of the bilinear model 
reveals the symmetrical nature of the curve. 

4.2.2 Induction of Ataxia in F^ts by ROH 

Log 1/C = 0.77(±0.10)logP 

- 1.53(±0.12)log(/3-P + 1) (1.77) 
+ 1 . 68 (± 0 . 12 ) 
n = 35, = 0.887, 

s = 0.165, logPo = 2.0 

The bilinear model has been used to model 
biological interactions in isolated receptor sys- 
tems and in adsorption, metabolism, elimina- ' 
tion, and toxicity studies, although it has a few 
limitations. These include the need for at least 
15 data points (because of the presence of the 
additional disposable parameter j3 and data 
points beyond optimum FogP. If the range in 
values for the dependent variable is limited, 
unreasonable slopes are obtained. 

4.3 Free-Wilson Approach 

The Free- Wilson approach is truly a structure- 
activity-based methodology because it incor- 
porates the contributions made by various 
structural fragments to the overall biological 
activity (22, 193, 194). It is represented by 
Equation 1.78. 



Log 1/C = a * log P - b ' log(j3 • P + 1) 



+ constant 



(1.76) 



where /3 is the ratio of the volumes of the or- 



BA. = 2 “A + (1-78) 

j 

Indicator variables are used to denote the pres- 
ence or absence of a particular structure feature. 
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Like classical QSAR, this de mvo approach as- 
sumes that substituent effects are additive and 
constant. BA is the biological activity; is the 
jth substituent, which carries a value 1 if 
present, 0 if absent. The term aj represents the 
contribution of the j th substituent to biological 
activity and pis the overall average activity. The 
summation of all activity contributions at each 
position must equal zero. The series of linear 
equations that are formulated are solved by lin- 
ear regression analysis. It is necessary for each 
substituent to appear more than once at a posi- 
tion in different combinations with substituents 
at other positions. 

There are certain advantages to the Free- 
Wilson method that have been addressed 
(193-195). Any type of quantitative biological 
data can be subject to such analysis. There is 
no need for any physicochemical constants. 
The molecules of a series may be structurally 
dissected in any way and multiple sites of sub- 
stitution are necessary and easily accommo- 
dated (196). Limitations include the large 
number of molecules with varying substituent 
combinations that are needed for this analysis 
and the inability of the system to handle non- 
linearity of the dependency of activity on sub- 
stituent properties. Intramolecular interac- 
tions between the substituent are not handled 
very well, although special treatments can be 
used to accommodate proximal effects. Ex- 
trapolation outside of the substituents used in 
the study is not feasible. Another problem in- 
herent with this approach is that usually a 
large number of variables is required to de- 
scribe a smaller number of compounds, which 
creates a statistical faux pas. Fujita and Ban 
modified this approach in two important ways 
(23). They expressed the biological activity on 
a logarithmic scale, to bring it into line with 
the extrathermodynamic approach, as seen in 
the following equation: 

LogXc = y o,X + M (1-79) 

This allowed the derived substituent con- 
stants to be compared with other free energy- 
related parameters. The overall average inter- 
cept u took on a new look, as it were, akin to an 
intercept in other QSAR analyses. 



Recent analyses of a Free-Wilsontype have 
included the in vitro inhibitory activity of a 
series of heterocyclic compounds against K. 
pneumonia (197). Other applications of the 
Free-Wilson approach have included studies 
on the antimycobacterial activity of 4-alkyl- 
thiobenzanilides, the antibacterial activity of 
fluoronapthyridines, and the benzodiazepine 
receptor-binding ability of some non-benzodi- 
apzepine compounds such as 3-X-imidazo- 
[ 1 , 2 - 6 ]p 3 Tidazines, 2-phenylimidazo [1,2- ajpyri- 
dines,2-(alkoxycarbony)imidazo[2,l-p]benzo- 
thiazoles, and 2-arylquinolones (198-200). 

4.4 Other QSAR Approaches 

The similarity in approaches of Hansch anal- 
ysis and Free-Wilson analysis allows them to 
be used within the same framework. This is 
based on their theoretical consistency and the 
numerical equivalencies of activity contribu- 
tions. This development has been called the 
mixed approach and can be represented by the 
following equation: 

Log 1/C = ^ a, + ^ Cj 0j + constant (1.80) 

The term denotes the contribution for each 
ith substituent, whereas 0j is any physicochem- 
ical property of a substituent Xj. For a thorough 
review of the relationship between Hansch'and 
Free-Wilson analyses, see the excellent reviews 
by Kubinyi (58, 195). A recent study of the 
P-glycoprotein inhibitory activity of 48 
propafenone-type modulators of multidrug re- 
sistance, using a combined Hansch/Free-Wilson 
approach was deemed to have higher predictive 
ability than that of a stand-alone Free-Wilson 
analysis (201). Molar refractivity, which has a 
high collinearity with molecular weight, was a 
significant determinant of modulating ability. It 
is of interest to note that molecular weight has 
been shown to be an omnipresent parameter in 
cross-resistance profiles in multidrug-resistance 
phenomena (167). 

5 APPLICATIONS OF QSAR 

Over the last 40 years, the glut in scientific 
information has resulted in the development 
of thousands of equations pertaining to struc- 
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NHp, HCI 




Figure 12. 4,6-Diamino- l,2-dihydro-2, 2-dimethyl- 
IR-s-triazines. 



ture-activity relationships in biological sys- 
tems. In its original definition, the Hansch 
equation was defined to model drug-receptor 
interactions involving electronic, steric, and 
hydrophobic contributions. Nonlinear rela- 
tionships helped refine this approach in cellu- 
lar systems and organisms where pharmacoki- 
netic constraints had to be considered and 
tackled. They have also found increased utility 
in addressing the complex QSAR of some re- 
ceptor-ligand interactions. In many cases the 
Kubinyi bilinear model has provided a sophis- 
ticated approach to delineation of steric effects 
in such interactions. Examples of ligand-re- 
ceptor interactions will be drawn from recep- 
tors such as the much- studied dihydrofolate 
reductases (DHFR), o:-chymotr3q)sin and 5a- 
reductase (202-204). 

5.1 Isolated Receptor Interactions 

The criticalrole of DHFR in protein, purine, and 
pyrimidine synthesis; the availability of crystal 
structures cf binary and ternary complexes of 
the enzyme; and the advent of molecular graph- 
ics combined to make DHFR an attractive target 
for well-designed heterocyclic ligands generally 
incorporating a 2,4-diamino- 1,-3-diazapharma- 
cophore (205). The earliest study focused on the 
inhibition of DHFR by 4,6-diamino- 1 ,2-dihydro- 
2, 2-dimethyl- IR-s-triazines, the structure of 
which is shown in Fig. 1.2 (202). 

5.1.1 Inhibition of Crude Pigeon Liver 
DHFR by Triazines (202) 

Fog 1/IC5 o = 2.21(±1. 00)77 

- 0.28(±0.17)7r2 

(1.81) 

+ 0.84(±0.76)D 
+ 2.58(±1.30) 



71 = 15, = 0.861, 

s = 0.553, 770 = 4(3.6-6.0) 

In all equations, n is the number of data 
points, r^ is the square of the correlation coef- 
ficient, s represents the standard deviation, 
and the figures in parentheses are for con- 
struction of the 95% confidence intervals, tt 
represents the hydrophobicity of the substitu- 
ent Rand ttq is the optimum hydrophobic con- 
tribution of the R substituent. D is an indica- 
tor variable that acquires a value of 1 .0 when a 
phenyl ring is present on the nitrogen and a 
value of zero for all other R. This is an example 
of a Hansch-Fujita-Ban analysis, where the in- 
dicator variable D establishes the contribution 
and thus the importance of a phenyl ring in 
DHFR inhibition. This equation has some lim- 
itations. Improper choice of N- substituents 
led to a high degree of collinearity between 
size and hydrophobicity and in terms of elec- 
tronic contributions, spanned space was lim- 
ited and thus inadequate. A subsequent study 
on the binding of these compounds to DHFR 
isolated from chicken liver was more reveal- 
ing. 

5.1.2 Inhibition of Chicken Liver DHFR by 
3-X-Triazines (207) 

Fog 1/Kj 

= 1 . 01 (± 0 . 14)77' 

(1.82) 

- 1.16(±0.19)log(j3-10"' + 1) 

+ 0.86(±0.57)o-+ 6.33(±0.14) 
n = 59, = 0.821, s = 0.906, 

77'o = 1.89(±0.36) log )3 = - 1.08 

In this example, the R group on the 2-nitrogen 
was restricted to an (3-X-phenyl) aromatic 
ring (205). Accurate values were obtained 
from highly purified DHFR isolated from 
chicken liver. In most cases, 77' represented 
the hydrophobicity of the substituent except 
in certain instances where X = -OR or 
— CH2ZC6H4-Y. It was ascertained that alkoxy 
substituents were not making direct hydro- 
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phobic contact with the enzyme, given that 
their inhibitory activities were essentially con- 
stant from the methoxy to the nonyloxy sub- 
stituent. In the bridged substituents where Z 
= O, NH, S, Se, the Y substituent again did not 
contact the enzyme surface. Variation in Yled 
to the same, constant biological activity. The 
coefficient with a' suggests that the substitu- 
ent is engulfed in a hydrophobic pocket that 
has an optimal tt'q of 2. This value is consis- 
tent with that seen in the crude pigeon liver 
DHFR corrected for the presence of the phenyl 
group (4.0 - 2.0 = 2). The 0.86 p value (coef- 
ficient with cr) suggests that there could be a 
dipolar interaction between the electron defi- 
cient phenyl ring and a region of positively 
charged electrostatic potential in the enzyme, 
perhaps an arginine, lysine, or histidine resi- 
due. Hathaway et al. developed a QSAR for the 
inhibition of human DHFR by 3-X-triazines 
and obtained Equation 1.83 (208). 

5.1.3 Inhibition of Human DHFR by 3-X- 
Triazines (208) 

Log 1/Ki 

= 1.07(±0.23)7t' 

- I,10(±0.26)log(j3 • lO’^' + 1) (1.83) 
+ 0.50(±0.19)7 + 0.82(±0.66)o- 
+ 6.07(±0.21) 

n = 60, = 0.792, s = 0.308, 

tt'o = 2.0(±0.87) log j3 = -0.577 

The enhanced activity of the "bridged" sub- 
stituents was corrected by the indicator vari- 
able 7. Note that triazines bearing the bridge 
moieties — CH 2 NHC 6 H 4 Y, — CHsOCgH^Y, 
and — CH 2 SC 6 H 4 Y had unusually high en- 
zyme binding activity. Note that the 
— CH 2 NHC 6 H 5 bridge is present in the endog- 
enous substrate, fohc acid. The bilinear depen- 
dency on hydrophobicity of the substituents 
parallels that seen in the case of chicken liver 
DHFR. A similar QSAR was obtained for 
DHFR isolated from L1210 murine leukemia 
cells (209). 



5.1 .4 Inhibition of LI 21 0 DHFR by 3-X-Tria- 
zines (209) 

Log 1/Ki 

= om±0.1AW 

- 1.14(±0.20)logO‘ lO’^' + 1) 

+ 0.79(±0.57)(t+ 6.12(±0.14) 

n = 58, r^ = 0.810, s = 0.264, 
tt'o = 1.76(±0.28) log fl = -0.979 

The consistency in these models versus pro- 
karyotic DHFR is established by the coeffi- 
cient with the hydrophobic term, the optimum 
a' value, and the rho value. These numerical 
coefficients can be contrasted sharply with 
those obtained from fungal and protozoal 
DHFR. Inhibition constants were determined 
for 3-X-triazines versus Pneumocystis carinii 
DHFR (210). 

5.1.5 Inhibition of P. carinii DHFR by 3-X- 
Triazines (210) 

Log 1/Ki 

= 0.73(±0.12)7t' 

- 1.36(±0.35)log(/3- lO’^' + 1) 

- 0.78(±0.42)7or 
+ 0.28(±0.21)MRy 
+ 6.48(±0.23) 

n = 43, r" = 0.840, s = 0.435, 
tt'o = 3.99(±0.68) log /3 = -3.925 

In Equation 1.85, I„ is an indicator variable 
that assumes a value of I when an alkoxy sub- 
stituent is present and 0 for all other substitu- 
ents. It is of interest to note that the Y sub- 
stituent on the second phenyl ring now 
contributes to activity. The MRy term sug- 
gests that it most probably accesses a polar 
region of the active site of the enzyme. The 
positive coefficient with MRy suggests that an 
increase in bulk and/or polarizability en- 
hances binding. The descending slope of the 
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bilinear equation is much steeper (1.36 - 0.73 
= 0.63) than that seen with the mammalian 
and avian enzymes. 

A similar model is obtained vs. the bifunc- 
tional protozoal DHFR from Leishmania ma- 
jor, which is coupled to thymidylate synthase 
( 211 ). 

5.1.6 Inhibition of L major DHFR by 3-X- 
Triazines (211) 



Uygl/K, 

= 0.65(±0.08)ir' 

- 1.22(±0.29)log(jS • 10^' + 1) 

- 1.12(±0.29)/or 
+ 0.58(±0.16)MRy 
- h 5.05(±0.16) 

71 = 41, r^ = 0.931, s = 0.298, 
tt'o = 4.54 log /3 = -4.491 

QSAR analysis on a limited set of 3-X-triazines 
assayed by Chio and Queener versus Toxo- 
plasmosis gondii led to the formulation of 
Equation 1.87 (202, 212). 

5.1.7 Inhibition of I gondii DHFR by 3-X- 
Triazines 



Log l/ICso = 0.39(±0.20)t7' 

- 0.43(±0.19)MRy+ 6.65(±0.30) 

71 = 17, r^ = 0.810, s = 0.289 

A quick comparison of QSAR 1.82-1.84 re- 
veals the strong similarity between the avian 
and mammalian models. In fact because of its 
increased stability, chicken liver DHFR has 
often been used as a surrogate for human 
DHFR in enzyme-inhibition studies. The in- 
tercepts, coefficients with tt'q and optimum 
tt'o for avian (6.33, 1.01, 1.9), human (6.07, 
1.07, 2.0), and mouse leukemia (6.12, 0.98, 
1.76) can be compared to the corresponding 
values for P. carinii (6.48, 0.73, 3.99) and 
Leishmania major (5.05, 0.65, 4.54). QSAR 
1.81 and 1.87 are not included in the compar- 
ison because crude pigeon enzyme was used in 



the former and the testing for QSAR 1 .87 was 
conducted under different assay conditions 
values were not determined. A noteworthy dif- 
ference between these models is the wide dis- 
parity in ttq values. The binding site of the 
protozoal and fungal species comprises an ex- 
tensive hydrophobic surface unlike the abbre- 
viated pockets in the mammalian and avian 
enzymes. The positive coefficients with the 
MRy terms suggests that added bulk on the 
bridged phenyl ring enhances inhibitory po- 
tency. The study versus T. gondii DHFR 
(QSAR 1 .87) included a number of mostly small, 
polar substituents (NHg, NO,, CONMeg) on 
the bridged phenyl and their activities were 
considerably lower than the unsubstituted an- 
alog. Comparative QSAR can be useful, partic- 
ularly if the biological data are consistent 
(tested under the same assay conditions, ex- 
cellent purity of enzymes, substrates, inhibi- 
tors, buffers), and the choice of substituents is 
appropriate. 

One of the major problems that arises with 
some QSAR studies is extrapolation from be- 
yond spanned space. Predictive ability is 
sound when one has probed an adequate range 
in electronic, hydrophobic, and steric space. At 
the onset of the study, the training set should 
address these concerns. Lack of adequate at- 
tention to such issues can result in QSAR ' 

models that are misleading. When examined 
on its own, such a model may appear to with- 
stand statistical rigor and apparent transpar- 
ency but, on being subjected to lateral valida- 
tion, loopholes emerge. A brief study to 
illustrate this phenomenon is outlined below. 

Four different QSAR were derived for the 
inhibition of DHFR from rat liver, human leu- 
kemia, mouse L1210, and bovine liver by 2,4- 
diamino, 5-Y, 6-Z-quinazolines (Fig. 1.3) (202, 
213-215). A comparison of their QSAR pre- 
sents an interesting study on the importance 
of spanned space in delineating enzyme-recep- 
tor interactions. 




Figure 1.3. 2,4-Diainino, 5-Y, 6-Z-quinazolines. 
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5.1.8 inhibition of Rat Liver DHFR by 2^4- 
Diamino^ 5-Y, 6-Z-quinazoiines(213) 



5.1.1 1 inhibition of Bovine Liver DHFR by 

2,4-Diamino, 5-Y, 6-Z-quinazoiines(215) 



Log l/ICso 

= 0.78( ±0.12)775 
+ 0.81(±0.12)MR6 



Log l/ICso = 0.70(±0.24)MRe 
+ 4.72(±0.59) 



(1.91) 



n = ll, r^ = 0.823, s = 0.420 



- 0.06(±0.02)MRe2 (1.88) 

- 0.73(±0.49)7i - 2.15(±0.38)/2 

- 0.54(±0. 21)73 - 1.40(±0.41)74 
+ 0.78(±0.37)7e 

- 0.20(±0.12)MR6-7 
+ 4.92(±0.23) 

n = 101, = 0.924, s =0.441, 

MR6,o = 6.4(±0.8) 

5.1.9 inhibition of Human Liver DHFR by 

2,4-Diamino, 5-Y, 6-Z-quinazoiines(214) 

Log X/Ki 

- -2.87(±0.16)7i 
+ 0.29(±0.14)72 

(1.89) 

-0,38(±0.11)MR6 

- 0.29(±0.06)t7r 

- 0.19(±0.07)MRr+ 10.12(±0.45) 

71 = 47, r^ = 0.914, s = 0.420 



These QSAR vary in size and the number of 
variables used to define inhibitory activity. 
Selassie and Klein have described a more thor- 
ough comparative analysis of these QSAR 
(202). A brief focus on the MR^ term reveals 
that its coefficients vary remarkably in all four 
sets. QSAR 1.88 is a parabola with an opti- 
mum of 6.4. Because it is parabolic in nature, 
the coefficient of the ascending slope cannot be 
compared with the linear slopes in QSAR 
7.59-7.97. Figure 1.4 illustrates the problems 
with QSAR 1.89-1.91, which failed to test an- 
alogs across the available space. 

Figure 1.4 reveals that QSAR 7.59 and 1.90 
were sampled in the suboptimal MRq range; 
thus, the negative dependency on MR,. On the 
other hand, QSAR 7.97 was focused on the 
ascending portion of the curve and thus only 
molecules in the 0. 1-3.4 range were tested. 
Thus, with a limited set of compounds, one 
gets a misleading picture of the biological 
interactions. 

Enzymatic reactions in nonaqueous sol- 
vents have generated a great deal of interest, 
fueled in part by the commercial application of 
enzymes as catalysts in specialty synthesis. 
The increasing demand for enantiopure phar- 
maceuticals has accelerated the study of enzy- 
matic reactions in organic solvents containing 



5.1.10 Inhibition of Murine L1210 DHFR by 

2,4-Diamino, 5-Y, 6-Z-quinazolines (214) 



Log 1/IC.50 

= 0.49(±0.11)72 

- 1.23(±0.25)73 

- 0.30(±0.07)MRe 



(1.90) 




- 0.12(±0.04)77r + 9.36(±0.27) 
71 = 24, r^ = 0.817, s = 0.235 



Figure 1.4. Gaps in spanned space of MR6 for 
2,4-diamino-quinazolines. 
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little or no water (2 16). To investigate the sub- 
strate specificity of a-chymotrypsin in penta- 
nol, a series of X-phenyl esters of N-benzoyl-L- 
alanine (Fig. 1.5) were synthesized and their 
binding constants were evaluated in buffer 
and in pentanol (203). The following QSAR 
1.92 and 1.93 were derived in phosphate 
buffer and pentanol. 



5.1.1 4 Binding of X-Phenyi, N-Benzoyl-L- 
aianinates in Aqueous Phosphate Buffer (218) 



Log 1//^M 



= 0.38(±0.11)7Th + 0,19(±0.07)7Ts 



+ 0.53(±0.11)cr- 



(1.94) 



5.1.12 Binding of X-Phenyi, N-Benzoyl-i- 
aianinates to a-Chymotrypsin in Phosphate 
Buffer, pH 7.4 (203) 



+ 0.26(±0.10)MR 3.77(±0.11) 

71 = 15, r^ = 0.806, s = 0.200 



Log l/ii^M 

= 0.28(±0.11)7r+ 0.51(±0.24)o-- (1.92) 
+ 0.38(±0.23)MR+ 3.70(±0.24) 

71 = 16, H = 0.834, s = 0.198 

5.1.1 3 Binding of X-Phenyi, N-Benzoyl-L-ala- 
ninates to a-Chymotrypsin in Pentanoi (203) 



5.1.15 Binding of X-Phenyi, N-Benzoyl-L- 
aianinates in Pentanoi (21 8) 

Log I/^Tm 

= 0.21(±0.08)7rH + 0.31(±0.05)7rs (1.95) 
+ 0.20(±0.08)o-" + 4.16(±0.04) 

71 = 15, r^ = 0.787, s = 0.160 



Logl/X:M = 0.25(±0.09)7 t 

0.24(±0.18)o-- (1-93) 

-f 4.10(±0.09) 

77 = 17, r^ = 0.762, s = 0.156 

Outliers in QSAR 1 .92 included the 4-t-butyl 
and 4-OH analogs, whereas the 4-CONH, 
analog was an outlier in QSAR 1.93. These 
results were recently reanalyzed by Kim 
(2 17,2 18) with respect to the role of enthal- 
pic and entropic contributions to ligand 
binding with a-chymotrypsin. Use of the Fu- 
jiwara hydrophobic enthalpy parameter r, 
and the hydrophobic entropy parameter ttq 
led to the development of QSAR 1.94 and 
1.95 (219). 




Figure 1.5. X-Phenyl, iV-benzoyl-L-aleininates. 



The disappearance of the MR term in QSAR 
1 .93 and 1 .95 is significant. The MR term usu- 
ally relates to nonspecific, dispersive interac- 
tions in polar space. Thus, its presence in 
QSAR 1.92 and 1.94 suggests that substrates * 
bearing polarizable substituents may displace 
the ordered-category II water molecules. In 
pentanol, the substrate may be faced with the 
task of displacing pentanol, not water, from 
the enzyme and thus the MR term is no longer 
of consequence. QSAR 1 .94 also indicates that 
the enthalpy term plays a more critical role 
in binding than the entropy term ttq. Note 
that these roles are reversed in QSAR 1.95, 
suggesting that binding in pentanol is largely 
an entropic-driven process. Similar results 
were obtained by Compadre et al. in a study on 
the hydrolysis of X-phenyl-N-benzoyl-glyci- 
nates by cathepsin B in aqueous buffer and 
acetonitrile (220). Kim's analysis provides an 
excellent example of a study that focuses on 
mechanistic interpretation and clearly dem- 
onstrates that a thermodynamic approach in 
QSAR can provide pertinent information 
about the energetics of the ligand binding pro- 
cess. 
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5a-Reductase, a critical enzyme in male 
sexual development, mediates the reduction cf 
testosterone to dihydrotestosterone (DHT). 
Elevated levels of DHT in certain disease 
states such as benign prostatic hypertrophy 
and prostatic cancer drives the need for effec- 
tive inhibitors of 5a-reductase. A recent QSAR 
study on inhibition of human 5a-reductase, 
type 1 by various steroid classes was carried 
out by Kurup et al. (204,22 1 ,222). A few of the 
models will be examined to demonstrate the 
importance and power of lateral validation. 
The three classes of steroidal inhibitors are 
depicted in Fig. 1.6. 



5.1.1 d Inhibition of 5-a-Reductase by 4-X, 
N-Y-6-azaandrost-l 7-CO-Z-4-ene-3-ones, I 

Log l/Ki 



n = ZL, r“ = u.»zy, s = u.4ut> 
outliers: X = Y = H, Z = NHCMeg; 
X = Me, Y = H, Z = CHaCHMe^ 




(III) 

Figure 1.6. Steroidal inhibitors of 5a-reductase. 



5.1.1 7 Inhibition of 5-a-Reductase by 17/3- 
(N - (X-pheny I) car bamoy I) -6 -azaand ro$t-4 -e ne- 
3-ones, II 

Log 1/i^i = 0.35(±0.09)ClogP 

+ 0.26(±0.11)B5ortho (1-97) 
+ 5.08(±0.58) 

n = 12, = 0.942, s = 0.154 

outlier: 2,5-(CF3)2 



5.1.18 Inhibition of 5-a-Reducta$e by 17/3- 
(N-(1-X-phenyl-cycloalkyl)carbamoyl)-6-azaan- 
drost-4-ene-3-ones, III 

Log UK, = 0.32(±0.17)ClogP 

+ 6.34(±1.15) 

n = 5, = 0.920, s = 0.090 



outlier: n = 5, X = 4-t-Bu 



In all these equations, the coefficients with hy- 
drophobicity as represented by ClogP, suggest 
that binding of these azaandrostene-ones oc- 
curs on the surface of the binding site where 
partial desolvation can occur. I is an indicator 
variable that pinpoints the negative effect of a 
double bond at C-1. A bulky substituent on 
N-6 is detrimental to activity, whereas a large 
substituent in the ortho position on the aro- 
matic ring enhances activity (QSAR 1 .97). The 
bulky ortho substituents (mostly t-Bu) may 
destroy coplanarity with the amide bridge by 
perhaps twisting of the phenyl ring and en- 
hancing its hydrophobic contact with the 
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binding site on the enzyme. Note that the 
larger intercept in QSAR 1.98 versus QSAR 
1.97 suggests that hydrophobicityis more im- 
portant in this area. 

5.2 Interactions at the Cellular Level 

QSAR analysis of studies at the cellular level 
allows us to get a handle on the physicochem- 
ical parameters critical to pharmacokinetics 
processes, mostly transport. Cell culture sys- 
tems offer an ideal way to determine the opti- 
mum hydrophobicity of a system that is more 
complex than an isolated receptor. Extensive 
QSAR have been developed on the toxicity of 
3-X-triazinesto many mammalian and bacte- 
rial cell lines (202, 209). A comparison of the 
cytotoxicities of these analogs vs. sensitive 
murine leukemia cells (L 12 10/S) and metho- 
trexate-resistant murine leukemia cells 
(L1210/R) reveals some startling differences. 

5.2.1 Inhibition of Growth of LI 21 0/S by 
3-X-Triazines (209) 

1/ICflo 

= 1.13(±0,18)t 7 
- 1.20(±0.21)logO • 10^ + 1) 

(1.99) 

+ 0.66(±0.23)/r 
-0.32(±0.17)/or 
+ 0.94(±0.37)(t + 6.72(±0.13) 
n = 61, = 0.792, a = 0.241, 

i7o = 1.45(±0.93) log 13 = -0.274 

5.2.2 Inhibition of Growth of L1210/R by 
3-X-Triazines (209) 

Lc^ 1/ICsn 

= 0.42(±0.05)7t -0.15(±0.05)MR (1.100) 
+ 4.83(±0.11) 

71 = 62, = 0.885, s = 0.220 

There is a radical difference between these 
two QSAR. QSAR 1.99 is very similar to the 
one (QSAR 1.84) obtained versus the L1210 



DHFR and it can be posited that the C34otox- 
icity in the sensitive cell line results from the 
inhibition of the enzyme. The intercepts sug- 
gest that slight interference with folate me- 
tabolism significantly affects growth. A com- 
parison of the sensitive and resistant QSAR 
reveals a substantial difference in the coeffi- 
cients with 77. The lack of many variables in 
QSAR 1 . 100 and its overall simplicity suggests 
that inhibition of the enzyme is not the critical 
step, but rather transport to the site of action 
in these resistant cells may be of utmost im- 
portance. This particular cell line was resis- 
tant to methotrexate by virtue of elevated lev- 
els of DHFR and also overexpression of 
glycoprotein, GP-170 (209). Thus, modified 
transport through the dysfunctional mem- 
brane would severely curtail the partitioning 
process, resulting in a coefficient with tt that is 
only one-half (0.42) of what is normally seen. 
The negative coefficient with the MR term in- 
dicates that size plays a role, albeit a negative 
one, in passage through the GP-170-fortified 
membrane and to the site of action. 

The QSAR paradigm has been shown to be 
particularly useful in environmental toxicology, 
especially in acute toxicity determinations of xe- 
nobiotics (223). There has recently been an em- 
phasis on "transparent, mechanistically com- 
prehensive QSAR for toxicity," a move that is 
welcomed by many researchers in the field (224, 
225). Cronin and Schultz developed QSAR 1.101 
to describe the polar, narcotic toxicity of a large 
set of substituted phenols. A number of phenols 
with ionizable or reactive groups (e.g., — COOH, 
^SIO, — NHg, or — ^NHCOCHg) were 
omitted from the final analysis (226). 

5.2.3 Inhibition of Growth of Tetrabymena 
pyriformis (40 h) 

Fog 1/C 

= 0.67(±0.02)ClogP (1.101) 

-0.67(±0.55)J5;lumo- 1-12 
77 = 120, = 0.893, s = 0.271 

Using Hammett cr constants, Garg et al. re- 
derived QSAR 1.102 for the same set and 
QSAR 1.103 and 1.104 for the diverse set of 
multi-, di-, and monophenols, which were se- 
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questered into two subsets containing elec- 
tron-releasing and electron-attracting sub- 
stituents, respectively (227). 

5.2.4 inhibition of Growth of T. pyriformis 
by Phenois (using &) (227) 

Log 1/C 

= 0.64(±0.04)ClogP (1.102) 

+ 0.61(±0.12)o-+ 1.84(±0.13) 



5.2.7 inhibition of Growth of T. pyriformis 
by Aromatic Compounds (229) 

Log l/IgCso 

= 0.6331ogP — 0.526.Elumo ^ 

+ 0. 721/2,4 AP ~ 1- ® 1/strong acid 

-I- 0.3142H-donor - 1,39 
n = 26S, H = 0.780, s = 0.393 



n = 119, = 0.896, s = 0.265 

5.2.5 inhibition of Growth of T. pyriformis 
by Eiectron-Reieasing Phenois (227) 



Log 1/C 



0.66(±0.05)ClogP 
+ 1.63(±0.15) 



(1.103) 



n — 44, — 0.946, s = 0.182 



5.2.6 inhibition of Growth of T. pyriformis 
by Eiectron- Attracting Phenois (227) 



Log 1/C = 0.63(±0.07)ClogP 

-I- 0.54(±0. 16)^0- (1.104) 

+ 1.92(±0.18) 

?z = 100, = 0.836, s = 0.327 



There is excellent agreement between QSAR 
1.101 and QSAR 1.104, in terms of the impor- 
tance of hydrophobicity and electron demand of 
the substituents: the coefficients with ClogP are 
similar and there is a good correspondence be- 
tween (j. Nevertheless, separation of 

the phenols into subsets, based on their elec- 
tronic attributes, indicates that different mech- 
anisms of toxicity might be operative in this or- 
ganism, a phenomenon that has been duplicated 
in mammalian cells (228). In a recent extension 
of toxicity studies on aromatics, Cronin and 
Schultz used a two-parameter or response-sur- 
face approach to define toxicity (229). In addi- 
tion, indicator variables and group counts were 
included to broaden the applicability of the ap- 
proach. An excellent comparison of the different 
modeling approaches (MLR,PLS, and Bayesigin- 
regularized neural networks) in QSAR is also 
made (229). 



The indicator variables / 2,4 ^ and /strong acid 
suggest that 2- and 4-amino-substituted phe- 
nols enhance toxicity, whereas strong acids 
decrease toxicity, respectively. The H-bond 
donor parameter may be correcting for the 
added potency of amino phenols. The low r^ 
may be attributed to inherent variability in 
biological data and to the commingling of data 
from four different studies. The wide variety 
of compounds with different toxicity mecha- 
nisms, present in this combined study, would 
also be a contributing factor to the low r^. 
Overall, this regression-based approach shows 
adequate predictability and is transparent, 
thus aiding in mechanistic interpretation. 

5.3 Interactions In Vivo 

The paucity of QSAR studies in whole animals 
is understandable in terms of the costs, the 
heterogeneity of the biological data, and the 
complexity of the results. Nevertheless, in the 
few studies that have been done, excellent 
QSAR have been obtained, despite the small 
number of subjects in the data set (164). One 
particular example is insightful. The renal and 
nonrenal clearance rates of a series of 11 
)3-blockers, including bufuralol, tolamolol, 
propranolol, alprenolol, oxprenolol, acebutol, 
timolol, metoprolol, prindolol, atenolol, and 
nadolol were measured (230). The following 
QSAR were formulated using those data (164). 

5.3.1 Renal Clearance of /3-Adrenoreceptor 
Antagonists 



Logk = -0.42(±0.12)ClogP 
+ 2.35(±0.24) 



(1.106) 



n = 10, = 0.888, s = 0.185 
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5.3.2 Nonrenal Clearance of ^-Adrenore- 
ceptor Antagonists 

Logfe = 1.94(±0.61)ClogP 

- 2.00(±0.80)log(j3-P + 1) (1.107) 
+ 1.29(±0.30) 

71 = 10, = 0.950, s = 0.168, 

ClogPo = 2.6 ± 1.5 log/3 = -0.813 
outlier: oxprenolol 

It is apparent from QSAR 1.106 and 1.107, 
that the hydrophobic requirements of the sub- 
strates vary considerably. As expected, renal 
clearance is enhanced in the case of hydro- 
philic drugs, whereas nonrenal clearance 
shows a strong dependency on hydrophobic- 
ity. Note that QSAR 1.107 is stretching the 
hmits of the bilinear model with only 10 data 
points! The 95% confidence intervals are 
also large but, nevertheless, the equations 
serve to emphasize the difference in clearance 
mechanisms that are clearly linked to 
hydrophobicity. 

In formulating QSAR, it is useful to use a 
weU-designed series to optimize a particular 
biological activity. It is also important to en- 
sure that the ratio of compounds to parame- 
ters is 5, so that collinearity is minimized 
while spanned space is maximized. A normal 
distribution of biological data is necessary. A 
violation of these guidelines usually leads to 
statistically insignificant QSAR or models 
that defy predictability. One of our earliest 
works on the inhibition of E. aoli DHFR by 
2,4-diamino-5-X-benzyl pyrimidines led to the 
derivation of the following equation (231): 

Logl/^i= -1.130-R+ 5.54 (1.108) 

77 = 10, r^ = 0.972, s = 0.182 

Most of the variance in these data was ex- 
plained by the Hammett through-resonance 
constant (cr^). It implied that electron-re- 
leasing substituents enhanced inhibitory po- 
tency. Later, expanded and extensive stud- 
ies on this system revealed that inhibition of 
the bacterial enzyme was related to mostly 



steric effects and there was no dependency 
on electronic terms. Careful analysis of the 
initial data revealed that it had a limited 
range in hydrophobicity and steric at- 
tributes. The lack of other QSAR to validate 
the findings in QSAR 1.108 made it statisti- 
cally significant, at that time, but mechanis- 
tically weak. Most weaknesses in QSAR for- 
mulations usually violate the compound-to- 
parameter ratio rule (232, 233). 

6 COMPARATIVE QSAR 

6.1 Database Development 

There are literally dozens of databases con- 
taining information about chemical struc- 
tures, synthetic methods, and reaction mech- 
anisms. The C-QSAR database is a database 
for QSAR models ( 164,234). It was designed to 
organize QSAR data on physical (PHYS) or- 
ganic reactions as weU as chemical-biological 
(BIO) interactions, in numerical terms, to 
bring cohesion and understanding to mecha- 
nisms of chemical-biodynamics. The two data- 
bases are organized on a similar format, with 
the emphasis on reaction types in the PHYS 
database. The entries in the BIO database are 
sequestered into six main groups: macromole- 
cules, enzymes, organelles, single-cell organ- 
isms, organs/tissues, and multicellular organ- 
isms (e.g., insects). The combined databases or 
the separate PHYS or BIO databases can be 
searched independently by a string search or 
searching using the SMILES notation. A 
SMILES search can be approached in three 
ways: one can identify every QSAR that con- 
tains a specific molecule, one can use a MER- 
LIN search that locates all derivatives of a 
given structure, or one can search on single or 
multiple parameters. Eor a more thorough de- 
scription of the C-QSAR database and ways to 
search it, see Hansch et al. (234) and Hansch 
et al. (164). The net result of searching the 
QSAR database is to "mine" for models; one 
could thus call it model-mining. 

6.2 Database: Mining for Models 

To enhance our understanding of ligand-re- 
ceptor interactions and bring coherence to 
these relationships, there needs to be a con- 
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Table 1.5 Rho Values for Chemical and Biochemical Reactions 





Solvent 


Radical Reagent 


n 


((t"-) 




Hydrogen Abstraction from Unhindered Phenols 




1 


CCI4 


(CHglaCO • 


14 


-1.81 (±0.77) 


2 


Benzene 


(CHalaCO • 


12 


-0.82 (±0.08) 


3 


CCI4 


iCHshCO ♦ 


5 


-0.82 (±0.16) 






X-phenols -Enzyme Systems 






1 


Horseradish peroxidase 


— 


13 


-2.68 (±0.78) 


2 


Ladoperoxidase 


— 


11 


-1.34 (±0.55) 



certed effort not only to develop high-quality 
regressions but also to create models that res- 
onate with those drawn from mechanistic or- 
ganic chemistry. A comprehensive, integrated 
database C-QSAR allows us to do so; it con- 
tains over 16,000 examples drawn from all fac- 
ets of chemistry and biology. An example on 
the toxicity of X-phenols will illustrate the use- 
fulness of this database (164, 228, 235-238). 
Recently, increasing numbers of QSAR for 
phenols have been based on Brown's term, 
an electronic term that was first designed to 
rationalize electronic effects of substituents 
on electrophilic aromatic substitution. Studies 
conducted at EPA gave early indications that 
embryologic defects of rat embryos in vitro 
could be correlated by as seen in QSAR 
1.109109 (239). 

6.2.1 Incidence of Tail Defects of Embryos 
(235) 

Log 1/C= -0.58(±0.21)o- + 

(1.109) 

+ 3.51(±0.14) 

n^lO, r^ = 0.832, s = 0.189 

Soon, this parameter was shown to correlate 
radical reactions in chemistry as well as chem- 
ical-biological interactions in an extensive 
compilation (240). Another older study by 
Richard et al. on the inhibition of replicative 
DNA synthesis in Chinese hamster ovary cells 
was examined and led to the development of 
Equation 1.110 (241). Again, there was a de- 
pendency on cr"^. 



6.2.2 Inhibition of DNA Synthesis in CHO 
Cells by X-Phenols (236) 

Eogl/C= -0.74(±0.34)o- + 

- 1.02(±0.41)CMR ( 1 . 110 ) 

+ 6.97(±1.16) 

n = 9, = 0.915, s = 0.305 

These Brown values were in line with those 

obtained from chemical and biological systems 
(228) see Table 1.5. 

C^4x)toxicity studies of X-phenols versus 
L1210 cells in culture led to an unusual result, 
which was baffling but reminiscent of Hammett 
plots related to changes in mechanism (228). 

6.2.3 Inhibition of Growth of LI 210 by X- 
Phenols 

Eog I/IC 50 

= -0.83(±0.18)o-" 

-f 0.74(±0.28)o-^2 

( 1 . 111 ) 

+ 0.56(±0.15)logP 
-0.45(±0.21)log(jS«R+ 1) 

+ 2.70(±0.26) 

n = 39, r^ = 0.913, s = 0.229, 
LogPo = -0.18 Logj8 = -2.28 
outliers: 4-C2H5,3-NH2 

Sequestering of the data into two subsets with 
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varying electronic attributes (a- > Oand cr’^ < 
0 ) led to the derivation of the following equa- 
tions. 

6.2.4 Inhibition of Growth of LI 210 by 
Electron-Withdrawing Substituents (o-"^ > 0) 

Log I/IC50 = 0.62(±0.16)LogP 

, ( 1 . 112 ) 

+ 2.35{±0.31) 

n - 15, = 0.845, s = 0.232, 

outlier: 3 -OH 

6.2.5 Inhibition of Growth of LI 210 by 
Electron-Donating Substituents < 0) 

Log I/IC50 = -1.58(±0.26)o-+ 

+ 0.21(±0.06)LogP (1.113) 
+ 3.10(±0.24) 

?i = 23, r^ = 0.898, s = 0.191, 
outliers: 3 -NH 2 , 4 -NHAc 

In QSAR 1.113, 62% of the variance is ac- 
counted for by cr^ and 28% is explained by 
log P. It appears that free-radical-mediated 
toxicity is responsible for the growth-inhibi- 
tory effects of the phenols. Homolytic bond 
dissociation energies related to the homolytic 
cleavage of the OH bond in the following reac- 
tion: (X--C 6 H 4 OH + CeHgO . ^ X— C 6 H 4 O . 
+ CeHsOH) have been used in lieu of cr^ val- 
ues. The net result is similar, as seen in QSAR 
1.114(242). 

Log I/IC 50 = -0.21(±0.03)BDE 

+ 0.21(±0.04)Log P (1.114) 
+ 3.11(±0.17) 

n = 52, = 0.920, s = 0.202, 

outliers: 4-NHAc, 3 -NH 2 , 3-NMe2 

This data set contains a wide diversity of phe- 
nolic inhibitors, including a large number of 
ortho-substituted compounds, estrogenic phe- 
nols (j3-estradiol, DES, nonyl phenol), and 
other antioxidants whose activities are well 



predicted by this model. The model suggests 
that cytotoxicity is an outcome of phenoxy 
radical formation and subsequent interaction 
with a relatively nonpolar receptor. The small 
hydrophobic coefficient suggests that DNA 
could be a likely target. 

The appearance of the parameter in a 
large number of reactions and interactions in- 
volving X-phenols indicates that the phenoxy 
radical can be a potent, reactive intermediate 
in myriad reactions. The availability of a fast, 
easily retrievable computerized database to 
corroborate this phenomenon was useful. This 
approach of lateral validation was crucial in 
establishing a QSAR model that was not only 
statistically significant but also mechanisti- 
cally interpretable. 

6.3 Progress in QSAR 

The last four decades have seen major changes 
in the QSAR paradigm. In tandem with devel- 
opments in molecular modeling and X-ray 
crystallography, it has impacted drug design 
and development in many ways. It has also 
spawned 3D QSAR approaches that are rou- 
tinely used in computer-assisted molecular de- 
sign. In terms of ligand design, it shares center 
stage with other approaches such as struc- 
ture-based ligand design and other rational 
drug design approaches including docking, 
methods and genetic algorithms (243). Suc- 
cess stories in QSAR have been recently re- 
viewed (244, 245). Bioactive compounds have 
emerged in agrochemistry, pesticide chemis- 
try, and medicinal chemistry. 

Bifenthrin, a pesticide, was the product of a 
design strategy that used cluster analysis 
(244) (Fig. 1.7). Guided by QSAR analysis, the 
chemists at Kyorin Pharmaceutical Company 
designed and developed Norfloxacin, a 
6 -fluoroquinolone, which heralded the arrival 
of a new class of antibacterial agents (246) 
(Fig. 1.7). Two azole-containing fungicides, 
metconazole (Fig. 1.8) and ipconazole were 
launched in 1994 in France and Japan, respec- 
tively (247). Fomerizine, a 4-F-benzhydryl-4- 
( 2 , 3 , 4 -trimetho 3 Q?' benzyl) piperazine, was in- 
troduced into the market in 1999 after 
extensive design strategies using QSAR (248) 
(Fig. 1.8). Flobufen, an anti-inflammatory 
agent was designed by Kuchar et al. as a long- 
acting agent without the usual gastric toxicity 
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(249) (Fig. 1.8). It is currently in clinical trial. 
Other examples of the commercial utility cf 
QSAR include the development of metamitron 
and bromobutide (250).In most of these exam- 
ples, QSAR was used in combination with 
other rational drug-design strategies, which is 
a useful and generally fruitful approach. 

In addition to these commercial successes, 
the QSAR paradigm has steadily evolved into 
a science. It is empirical in nature and it seeks 
to bring coherence and rigor to the QSAR 
models that are developed. By comparing 
models one is able to more fully comprehend 
scientific phenomena with a "global" perspec- 
tive; trends in patterns of reactivity or biolog- 
ical activity become self-evident. 

7 SUMMARY 

QSAR has done much to enhance our under- 
standing of fundamental processes and phe- 
nomena in medicinal chemistry and drug de- 
sign (25 1). The concept of hydrophobicity and 
its calculation has generated much knowledge 
and discussion as well as spawned a mini-in- 
dustry. QSAR has refined our thinking on se- 
lectivity at the molecular and cellular level. 
Hydrophobic requirements vary considerably 
between tumor- sensitive cells and resistant 
ones. It has allowed us to design more selectiv- 
ity into antibacterial agents that bind to dihy- 
drofolate reductase. QSAR studies in the 
pharmacokinetic arena have established dif- 
ferent hydrophobic requirements for renal/ 
nonrenal clearance, whereas the optimum hy- 




Figure 1.8. Lomerizine, Metconazole, and Flobufen. 



drophobicity for CNS penetration has been 
determined by Hansch et al. (252). QSAR has 
helped delineate allosteric effects in enzymes' 
such as (^clooxygenase, trypsin, and in the 
well-defined and complex hemoglobin system 
(253,254). 

QSAR has matured over the last few de- 
cades in terms of the descriptors, models, 
methods of analysis, and choice of substitu- 
ents and compounds. Embarking on a QSAR 
project may be a daunting and confusing task 
to a novice. However, there are many excellent 
reviews and tomes (1,4, 19, 58-60) on this 
subject that can aid in the elucidation of the 
paradigm. Dealing with biological systems is 
not a simple problem and in attempting to de- 
velop a QSAR, one must always be cognizant 
of the biochemistry of the system analyzed 
and the limitations of the approach used. 
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1 INTRODUCTION 

Quantitative structure-activity relationship 
(QSAR) methodology was introduced by 
Hansch et al. in the early 1960s (1,2). The 
approach stemmed from linear free-energy re- 
lationships in general and the Hammett equa- 
tion in particular (3). It is based on the as- 
sumption that the difference in structural 
properties accounts for the difference in bio- 
logical activities of compounds. According to 
this approach, the structural changes that af- 
fect the biological activities of a set of conge- 
ners are of three major types: electronic, 
steric, and hydrophobic (4). These structural 
properties are often described by Hammett 
electronic constants (5), Verloop STERIMOL 
parameters (6), hydrophobic constants (5), to 
name but a few. The relationship between a 
biological activity (or chemical property) and 
the structural parameters is obtained through 
the use of linear or multiple linear regression 
(MLR) analysis. The fundamentals and appli- 
cations of this method in chemistry and biol- 
ogy have been summarized by Hansch and Leo 
(4) and an account of the most recent develop- 
ments in this area of traditional QSAR ap- 
pears in the chapter by Celassie in this series 
(7). As discussed in that chapter, the history of 
modern QSAR counts over 40 years of active 
research in method development and its appli- 
cations. It is practically impossible to review 
all, even relatively recent, developments in the 
field in a single chapter. Several reviews and 
monographs on QSAR and its applications 
have been published in recent years (4, 8-12) 
and the reader is referred to this collection of 
general references and publications cited 
therein for additional in-depth information. 

One of the most characteristic features of 
the modern age QSAR as an integral part of 
drugdesign and discovery is an unprecedented 
growth of biomolecular databases, which con- 
tain data on chemical structure and, in some 
cases, biological activity (or other relevant 
drug properties such as toxicity or mutagenic- 
ity) of chemicals. Figure 2. 1 illustrates the fast 
growth of one of such databases, the Chemical 
Abstract Service (CAS) registry file (13). The 




Year 

Figure 2.1. Growth in the number of chemical 
compounds, excluding biopolymers, registered by 
the Chemical Abstract Service (CAS). 

growth has been phenomenal: CAS currently 
contains more than 39 million compounds, in- 
cluding biological sequences [and it does not 
include chemical libraries, which literally in- 
clude billions of compounds (14)]. Naturally, 
the growth of molecular databases has been 
concurrent with the acceleration of the drug 
discovery process. According to an excellent, 
recent historical account of drug discovery 
(15), as the result of high throughput screen- 
ing (HTS) technologies, the amount of raw 
data points obtained by a large pharmaceuti- 
cal company per year has increased from ap- 
proximately 200,000 at the beginning of last 
decade to around 50 million today. The total 
number of drugs used worldwide is approxi- 
mately 80,000, which reportedly act at less 
than 500 confirmed molecular targets (15). 
Recent estimates suggest that the number of 
potential targets lies between 5000 and 
10,000, approximately 10-fold greater than 
the number of targets currently pursued (15). 
Although traditional QSAR modeling has 
been typically limited to deal with a maximum 
of several dozen compounds at a time, rapid 
generation of large quantities of data requires 
new methodologies for data analysis. New ap- 
proaches need to be developed to establish 
QSAR models for hundreds, if not thousands, 
of molecules. These new methods should be 
robust, yet computationally efficient, to com- 
pete with the experimental methods of drug 
discovery, such as combinatorial chemistry 
and HTS. 
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This chapter concentrates on recent trends 
and developments in QSAR methodology, 
which are characterized by the growing size of 
the data sets subjected to the QSAR analysis, 
use of multiple descriptors of chemical struc- 
ture, application of both linear and, especially, 
nonlinear optimization algorithms applicable 
to multidimensional data modeling, growing 
emphasis on the rigorous model validation, 
and application of QSAR models as virtual 
screening tools in database mining and chem- 
ical library design. We begin by presenting a 
unified concept of QSAR, emphasizing com- 
mon aspects of different QSAR methodologies. 
^\b then consider some popular approaches to 
the derivation of molecular descriptors and 
optimization algorithms in the context of 
three important components of any QSAR in- 
vestigation: model development, model valida- 
tion, and model utility. We conclude with sev- 
eral remarks on present status and future 
developments in this exciting research disci- 
pline. 

1.1 A Unified Concept of QSAR 

An inexperienced user or sometimes even an 
avid practitioner of QSAR could be easily con- 
fused by the multitude of methodologies and 
naming conventions used in QSAR studies. 
Two-dimensional (2D) and three-dimensional 
(3D) QSAR, variable selection and artificial 
neural network methods, comparative molec- 
ular field analysis (CoMFA), and binary QSAR 
present examples of various terms that may 
appear to describe totally independent ap- 
proaches, which cannot be even compared to 
each other. In fact, any QSAR method can be 
generally defined as the application of mathe- 
matical and statistical methods to the problem 
of finding empirical relationships (QSAR mod- 
els) cf the form Pj = ^(Di,D„ ■ ■ .D„), where 
are biological activities (or other properties of 
interest) of molecules, D-^, D„ . . • ,D^ are cal- 
culated (or, sometimes, experimentally mea- 
sured) structural properties (molecular de- 
scriptors) of compounds, and k is some 
empirically established mathematical trans- 
formation that should be applied to descrip- 
tors to calculate the property values for all 
molecules. The relationship between values of 
descriptors D and target properties P can be 
linear [e.g., multiple linear regression (MLR) 
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{P} = K{D} 

Figure 2.2. Standard QSAR table is a general 
starting point of any QSAR approach. 

as in the Hansch QSAR approach], where tar- 
get property can be calculated directly from 
the descriptor values, or nonlinear (such as 
artificial neural networks or classification 
QSAR methods), where descriptor values are 
used in characterizing chemical similarity be- 
tween molecules, which in turn is used to pre- 
dict compound activity. In general, each com- 
pound can be represented by a point in a 
multidimensional space, in which descriptors 
Di, D„ serve as independent coordi- 

nates of the compound. The goal of QSAR 

modelingis to establish a trend in the descrip- 
tor values, which correlates, in a linear or non- 
linear fashion, with the trend in biological ac- 
tivity. AH QSAR approaches imply, directly or 
indirectly, a simple similarity principle, which 
for a long time has provided a foundation 
for experimental medicinal chemistry: com- 
pounds with similar structures are expected to 
have similar biological activities. This implies 
that points representing compounds with sim- 
ilar activities in multidimentional descriptor 
space should be geometrically close to each 
other, and vice versa. 

Despite formal differences between various 
methodologies, any QSAR method is based on 
a QSAR table, which can be generalized, as 
shown in Fig. 2.2. To initiate a QSAR study, 
this table must include some identifiers of 
chemical structures (e.g., company's ID num- 
bers, first column of the table in Fig. 2.2), re- 
liably measured values of biological activity 
[or any other target property of interest (e.g., 
solubility, metabolic transformation rate, etc. ; 
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second column)], and calculated values of mo- 
lecular descriptors in all remaining columns 
(sometimes, experimentally determined phys- 
ical properties of compounds can be used as 
descriptors as well). 

The differences in various QSAR method- 
ologies can be understood in terms of types of 
target property values, types of descriptors, 
and differences in optimization algorithms 
used to relate descriptors to the target proper- 
ties. The target property values can be defined 
as activity classes [i.e., active or inactive, fre- 
quently encoded numerically for the purpose 
of the subsequent analysis as one (for active) 
or zero (for inactive)] or as a continuous range 
of values; the corresponding methods of data 
analysis are referred to as classification or con- 
tinuous property QSAR, respectively. Descrip- 
tors can be generated from various represen- 
tations of molecules (e.g., 2D chemical graphs 
or 3D molecular geometries), giving rise to the 
terms of 2D- or 3D-QSAR, respectively. Fi- 
nally, the types of optimization algorithms 
used in the QSAR model development lead to 
the definitions of linear versus nonlinear 
QSAR methods. 

In some cases, the types of biological data, 
the choice of descriptors, and the class of opti- 
mization methods are closely related and mu- 
tually inclusive. For instance, multiple linear 
regression can be applied only when a rela- 
tively small number of molecular descriptors 
are used (at least five to six times smaller than 
the total number of compounds) and the tar- 
get property is characterized by a continuous 
range of values. The use of multiple descrip- 
tors makes it impossible to use MLR because 
of a high chance of spurious correlation (16) 
and requires the use of partial least squares or 
nonlinear optimization techniques. However, 
in general, for any given data set a user could 
choose between various types of descriptors 
and various optimization schemes, combining 
them in a practically mix-and-match mode, to 
arrive at statistically significant QSAR models 
in a variety of ways. This situation is in es- 
sence analogous to molecular mechanics cal- 
culations (17), where different force fields and 
differently derived parameters are developed 
by different groups, although the common 
goal is to compute (unique) optimized geome- 
tries of molecules from their chemical compo- 



sition and coordinates of all atoms. Thus, in 
general, all QSAR models can be universally 
compared in terms of their statistical signifi- 
cance and, most important, their ability to 
predict accurately biological activities (or 
other target properties) of molecules not in- 
cluded in the training set (cf. molecular me- 
chanics, where different methods are ulti- 
mately compared by their ability to reproduce 
experimental molecular geometries). This 
concept of statistical robustness and the pre- 
dictive ability as universal characteristics of 
any QSAR model independent of the particu- 
lars of individual approaches should be kept in 
mind as we consider examples of QSAR tools, 
their applications, and pitfalls in the subse- 
quent sections of this chapter. 

1.2 The Taxonomy of QSAR Approaches 

Many different approaches to QSAR have 
been developed since Hansch's seminal work. 
As briefly discussed above, the major differ- 
ences between these methods can be analyzed 
from two viewpoints: (7) the types of struc- 
tural parameters that are used to characterize 
molecular identities, starting from different 
representation of molecules, from simple 
chemical formulas to three-dimensional con- 
formations; and (2) the mathematical proce- 
dure that is employed to obtain the quantita- 
tive relationship between these structural 
parameters and biological activity. 

On the basis of the origin of molecular de- 
scriptors used in calculations, QSAR methods 
can be divided into three groups. One group is 
based on a relatively small number (usually 
many times smaller than the number of com- 
pounds in a data set) of physicochemical prop- 
erties and parameters describing, for example, 
hydrophobic, steric, and electrostatic effects. 
Usually, these descriptors are used as inde- 
pendent variables in multiple regression ap- 
proaches (18).In the literature, these methods 
are typically referred to as Hansch analysis 
(8). These types of descriptors and correspond- 
ing linear optimization methods used in tradi- 
tional QSAR analyses are discussed exten- 
sively in the chapter by Celassie (7) and 
therefore is not reviewed here. 

More recent methods are based on quanti- 
tative characteristics of molecular graphs 
(molecular topological descriptors). Because 
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molecular graphs or structural formulas are 
"two-dimensional," these methods are re- 
ferred to as 2D-QSAR. Most of the 2D-QSAR 
methods are based on graph theoretical indi- 
ces, which have been extensively studied by 
Randle (19) and Kier and Hall (20-22). They 
include, for example, molecular connectivity 
indices (19, 20), molecular shape indices (23, 
24), topological (25) and electrotopological 
state indices (26-29), and atom-pair descrip- 
tors (30, 31). Sometimes, topological descrip- 
tors are also combined with physicochemical 
properties of molecules. Although these struc- 
tural indices represent different aspects of 
molecular structures, and, what is important 
for QSAR, different structures provide nu- 
merically different values of indices, their 
physicochemical meaning is frequently un- 
clear. The successful applications of topologi- 
cal indices combined with multiple linear 
regression (MLR) analysis have been summa- 
rized by Kier and Hall (20, 21, 28). 

The third group of methods is based on de- 
scriptors derived from spatial (three-dimen- 
sional) representation of molecular struc- 
tures. Correspondingly, these methods are 
referred to as three-dimensional or 3D-QSAR; 
they have become increasingly popular with 
the development of fast and accurate compu- 
tational methods for generating 3D conforma- 
tions and alignments of chemical structures. 
The early examples of 3D-QSAR include mo- 
lecular shape analysis (MSA) (32), distance ge- 
ometry (33, 34), and Voronoi techniques (35). 
The first method uses shape descriptors and 
multiple linear regression analysis, whereas 
the latter methods apply atomic refractivity as 
structural descriptors and the solution of 
mathematical inequalities to obtain the quan- 
titative relationships. These two methods 
have been applied to the study of structure- 
activity relationships of many data sets by 
Hopfinger (e.g., Refs. 36, 37) and Crippen (e.g., 
Refs. 38, 39), respectively. 

Perhaps the most popular example of 3D- 
QSARis the comparative molecular field anal- 
ysis (CoMFA), developed by Cramer et al. (40), 
which has elegantly combined the power of 3D 
molecular modeling and partial least-square 
(PLS) optimization technique (41, 42) and 
found wide applications in medicinal chemis- 
try and toxicity analysis (see below). Most of 



3D-QSAR methods require 3D alignment of aOl 
molecules according to a phannacophore 
model or based on ligand docking to a recep- 
tor-binding site. Descriptors in the case of 
CoMFA (40, 43) and CoMFA-like methods 
such as COMBINE (44), COMSiA (45), and 
QsiAR (46) represent electrostatic, steric, and 
hydrophobic field values (to name but a few 
examples) in the grid points surrounding mol- 
ecules. 

Finally, QSAR methods can also be classi- 
fied by the type of the correlation methods 
used in model development. Linear methods 
include linear regression or MLR, PLS (41, 42, 
47), or principal component regression (PCR), 
whereas nonlinear methods can be exempli- 
fied, for example, by k-Nearest Neighbors 
(kNN) (48, 49) and artificial neural networks 
(50) methods. An example of the linear meth- 
ods is provided by the ADAPT system, which 
employs topological indices as well as other 
calculable structural parameters (e.g., steric 
and quantum mechanical parameters), and 
the MLR method for QSAR analysis. It has 
been extensively applied to QSAR/QSPR stud- 
ies in analytical chemistry, toxicity analysis, 
and other biological activity prediction (51- 
54). Parameters derived from various experi- 
ments through chemometric methods have 
also been used in the study of peptide QSAR 
(55), where PLS analysis was employed. The 
latter technique has been used almost exclu- 
sively in 3D-QSAR, where the number of de- 
scriptors characterizing molecular fields may 
exceed the number of compounds by orders of 
magnitude. 

There has been a great deal of interest, es- 
pecially more recently, in the use of data min- 
ing methods to extract the information from 
large and/or chemically inhomogeneous data 
sets. Examples of these methods include pat- 
tern recognition (56, 57), automated structure 
evaluation (58, 59), neural network (60-62), 
and machine learning (63-65). Recent trends 
in QSAR studies also include developing opti- 
mal QSAR models through variable selection, 
that is, by selecting a subset of available de- 
scriptors in either MLR, PLS, or nonlinear 
classification or artificial neural networks 
(ANN) analysis as applied either in 2D- (66- 
72) or in 3D-QSAR (73). These methods em- 
ploy either generalized simulated annealing 
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(67), or genetic algorithms (68), or evolution- 
ary algorithms (69-72) as optimization tools. 
The effectiveness and convergence of these al- 
gorithms are strongly affected by the choice of 
a fitting function, which drives the optimiza- 
tion process (70-72). It has been demon- 
strated that optimization combined with vari- 
able selection effectively improves QSAR 
models as compared to those without variable 
selection. For example, GOLPE (74) was de- 
veloped through the use of chemometric prin- 
ciples and g^-GRS (75) was developed on the 
basis of independent CoMFA analysis of small 
areas of CoMFA descriptor space, to address 
the issue of region selection. Both of these 
methods have been shown to improve QSAR 
models compared to the original CoMFA tech- 
nique. 

Different QSAR methods have their own 
strengths and weaknesses. For example, 3D- 
QSAR methods generally result in the dia- 
grams of important molecular fields that can 
be easily interpreted in terms of specific steric 
and electrostatic interactions important for 
the ligand binding to their receptor. However, 
the need to align structures in 3D, which is 
time-consuming and subjective, precludes the 
use of 3D-QSAR techniques for the analysis of 
large data sets. On the other hand, 2D-QSAR 
methods are much faster and more amenable 
to automation because they require no confor- 
mational search and structural alignment. 
Thus, 2D methods are best suited for the anal- 
ysis of large numbers of compounds and com- 
putational screening of molecular databases; 
however, the interpretation of the resulting 
models in familiar chemical terms is fre- 
quently difficult, if not impossible. 

The generality of the QSAR modeling ap- 
proach as a drug discovery tool, irrespective of 
descriptor types or optimization algorithms, 
can be best demonstrated in the context of in- 
verse QSAR, which can be defined as design- 
ing or discovering molecular structures with a 
desired property on the basis of QSAR models 
(76-78).In practical terms, inverse QSAR also 
includes searching for molecules with a de- 
sired target property in chemical databases or 
virtual chemical libraries. These consider- 
ations emphasize the universal importance of 
establishing QSAR model robustness and pre- 
dictive ability as opposed to concentrating on 



explanatory power, which has been a charac- 
teristic feature of many traditional QSAR ap- 
proaches. 

2 MULTIPLE DESCRIPTORS OF 
MOLECULAR STRUCTURE 

It has been said frequently that there are 
three keys to the success of any QSAR model 
building exercise: descriptors, descriptors, 
and descriptors. Many different molecular 
representations have been proposed, exempli- 
fied by Hansch-type parameters (2), topologi- 
cal indices (19, 79), quantum mechanical de- 
scriptors (80), molecular shapes (32, 81), 
molecular fields (40), atomic counts (82), 2D 
fragments (83-85), 3D fragments (86-88), 
molecular eigenvalues (89), molecular multi- 
pole moments (90), E- state fields (28), molec- 
ular fragment-based hash codes (91, 92), and 
molecular holograms (93). A recent review by 
Livingstone provides an excellent survey of 
various 2D and 3D descriptors, along with 
some associated diversity and similarity func- 
tions (9) . V arious physicochemical parameters 
such as the partition coefficient, molar refrac- 
tivity, and quantum mechanical quantities 
such as highest occupied molecular orbital 
(HOMO) and lowest occupied molecular or- 
bital (LUMO) energies have been used to rep' 
resent molecular identities in early QSAR 
studies by the use of linear and multiple linear 
regression. However, these descriptors are not 
suited for the analysis of large numbers of 
molecules, either because of the lack of physi- 
cochemical parameters for compounds yet to 
be synthesized or because of the computa- 
tional expenses required by quantum mechan- 
ical methods. Recent years have seen the ap- 
plication of various topological descriptors 
that are usually derived from either 2D or 3D 
molecular structural information based on the 
graph theory or molecular topology (20-22, 
94). These descriptors are generated on the 
basis of the molecular connectivity, 3D molec- 
ular topography, and molecular field proper- 
ties. 

2.1 Topological Descriptors 

Two widely applied examples of 2D molecular 
descriptors are molecular connectivity indices 



2 Multiple Descriptors of Molecular Structure 

(MCI)and atom-pair (AP) descriptors. Molec- 
ular connectivity indices, x> were first formu- 
lated by Randic (19) and subsequently gener- 
alized and extended by Kier and Hall (20-22). 
The fundamentals and applications of molec- 
ular connectivity indices have been thor- 
oughly reviewed (22, 28). Apopular MolConnZ 
software (95) affords the computation of a 
wide range of topological indices of molecular 
structure. These indices include (but are not 
limited to) the following descriptors: simple 
and valence path, cluster, path/cluster and 
chain molecular connectivity indices, kappa 
molecular shape indices, topological and elec- 
trotopological state indices, differential 
connectivity indices, the graph's radius and 
diameter, Wiener and Platt indices. Shannon 
and Bonchev-Trinajstic information indices, 
counts of different vertices, and counts of 
paths and edges between different kinds of 
vertices (19, 20, 96-100). 

Overall, MolConnZ (95) produces over 400 
different descriptors. Most of these descrip- 
tors characterize chemical structure, but sev- 
eral depend on the arbitrary numbering of at- 
oms in a molecule and are introduced solely for 
bookkeeping purposes. In a typical QSAR 
study, only about one-half of all possible Mol- 
ConnZ descriptors are eventually used, after 
deleting descriptors with zero value or zero 
variance. Figure 2.3 provides a summary of 
these molecular descriptors and presents 
some algorithms used in their derivation. 

The idea of using atom pairs as molecular 
features in structure-activity studies vas first 
proposed by Carhart et al. (84). AP descriptors 
are defined by their atom types and topological 
distance bins. An AP is a substructure defined 
hy two atom types and the shortest path sep- 
aration (or graph distance) between the at- 
oms. The graph distance is defined as the 
smallest number of atoms along the path con- 
necting two atoms in a molecular structure. 
The general form of an atom-pair descriptor is 
as follows: 

atom type i (distance) atom type j 

where atom chemical types are typically de- 
fined by the user. For example, 15 atom types 
can be defined by use of the SYBYL mol2 for- 
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mat (101) as follows: (1) negative charge cen- 
ter (NCC); (2) positive charge center (PCC); 
(3) hydrogen bond acceptor (HA); (4) hydro- 
gen bond donor (HD); (5)aromatic ring center 
(ARC); (6) nitrogen atoms (N); (7) oxygen at- 
oms (0) ; (8)sulfur atoms (S); (9)phosphorous 
atoms (P); (10) fluorine atoms (FL); (11) chlo- 
rine, bromine, iodine atoms (HAL); (12) car- 
bon atoms (C); (13) all other elements (OE); 
(14) triple bond center (TBC); and (15)double 
bond center (DBC). Apparently, the total 
number of pedrwise combinations of all 15 
atom types is 120. Furthermore, distance bins 
should be defined to discriminate between 
identical atom pairs separated by different 
graph distances and therefore representing 
different molecular substructures. Thus, 15 
distance bins can be introduced in the interval 
between graph distance zero (i.e., zero atoms 
separating an atom pair) to 14 and greater. 
Thus, in this format a total of 1800 (120 X 15) 
AP descriptors can be generated for any mo- 
lecular structure. An example of an atom-pair 
descriptor is shown on Fig. 2.4. Frequently, as 
applied to particular data sets, many of the 
theoretically possible AP descriptors have 
zero value (implying that certain atom types 
or atom pairs are absent in molecular struc- 
tures) . For instance, in our recent studies of 48 
anticonvulsant agents, only 273 descriptors 
with nonzero value and nonzero variance were' 
generated (102). 

2.2 3D Descriptors 

The rapid increase in structural three-dimen- 
sional (3D) information of bioorganic mole- 
cules (103, 104), coupled with the develop- 
ment of fast methods for 3D structure 
generation [e.g., CONCORD (105, 106) and 
CORINA (107)] and alignment [e.g., Active 
Analog Approach (43, 108)], have led to the 
development of 3D structural descriptors and 
associated 3D-QSAR methods. Many 3D- 
QSAR methods (considered below) make use 
of so-called molecular field descriptors. To cal- 
culate these descriptors, steric and electro- 
static fields of all molecules are sampled with a 
probe atom, usually carbon sp^ bearing a +1 
charge, on a rectangular grid that encom- 
passes structurally aligned molecules. The 
values of both van der Waals and electrostatic 
interactions between the probe atom and all 
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Figure 2.3. Examples of topological descriptors frequently used in QSAR studies. 




N- -(7)--S 

Figure 2.4. Example cf an AP descriptor: two atom 
types, aliphatic nitrogen and aliphatic sulfur, sepa- 
rated by the shortest chemical graph path of seven. 



atoms of each molecule are calculated in every 
lattice point by use of the force field equation 
described above and entered into the CoMFA 
QSAR table (Fig. 2.5), which typically contains 
thousands of columns. Additional molecular 
field descriptors such as HINT (Hydropathic 
INTeraction) descriptors (109) could improve 
the CoMFA model. PLS algorithms coupled 
with leave-one-out (LOO) cross-validation is 
typically used to arrive at statistically signifi- 
cant CoMFA models. 
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Figure 2.5. Process of steric and electrostatic descriptor generation in CoMFA. Note that this 
process results in a familiar QSAR table (cf. Fig. 2.2). PLS is used as a standard analytical technique 
in CoMFA. 



One of the most attractive features of the 
CoMFA and CoMFA-like methods is that, be- 
cause of the nature of molecular field descrip- 
tors, these approaches yield models that are 
relatively easy to interpret in chemical terms. 
Famous CoMFA contour plots, which are ob- 
tained as a result of any successful CoMFA 
study, tell chemists in rather plain terms how 
the change in the compounds' size or charge 
distribution as a result of chemical modifica- 
tion correlate with the binding constant or ac- 
tivity. These observations may immediately 
suggest to a chemist possible ways to modify 
molecules to increase their potencies. How- 
ever; as demonstrated in the next section, 
these predictions should be taken with caution 
only after sufficient work has been done to 
prove the statistical significance and predic- 
tive ability of the models. 

By analogy with 2D atom-pair descriptors 
(Fig. 2.4), 3D AP descriptors can also be de- 



fined through the use of similar atom types 
and atom pairs and 3D molecular topography; 
in this case, a physical distance between atom 
types is used in place of chemical graph dis- 
tance. The distance between two "atoms" is 
measured and then assigned into one or two 
distance bins. Typically, the width of each dis- 
tance bin is chosen as 1.0 A. Because it is also 
designed to let the adjacent bins have 10% 
overlap with each other, the actual length of 
each distance bin is 1 .2 A. Any distance located 
in the overlap region is assigned to both bins. 
This "fuzzy distance" concept is adopted to 
alleviate the possible unfavorable boundary 
effects of the distance bins. For example, with 
strict boundary conditions, a distance of 2.05 
A will be assigned only to bin No. 2, but it can 
be reasonably argued that it is almost as close 
to the upper half of bin No. 1 as to bin No. 2. 
With fuzzy boundary conditions, 2.05 A be- 
longs to both bin No. 1 and bin No. 2, allowing 
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a possible match to either. All the distances 
greater than 20 A are assigned into the last 
bin. 

3 C6AR MODELING APPROAOES 



3.1 3D-QSAR 

Two original 3D-QSAR methods, CoMFA (40) 
and GRID (110), were developed almost simul- 
taneously in the mid- to late- 1980s (9). Since its 
introduction, the CoMFA approach has rapidly 
become one of the most popular methods of 
QSAR Over the years, this approach has been 
apphed to a wide variety of receptor and enzyme 
ligands [many reviews appeared in a recent 
monograph (10)]. Undoubtedly, the further de- 
velopment of this and related methods is of great 
importance and interest to many scientists 
working in the area of rational drug design. 

CoMFA methodology is based on the as- 
sumption that because, in most cases, the 
drug-receptor interactions are noncovalent, 
the changes in the biological activities or bind- 
ing affinities of sample compounds correlate 
with changes in the steric and electrostatic 
fields of these molecules. In a standard 
CoMFA procedure, all molecules under inves- 
tigation are first structurally aligned, and the 
steric and electrostatic fields around them are 
sampled with probe atoms, usually sp^ carbon 
with a +1 charge, on a rectangular grid that 
encompasses aligned molecules. The results of 
the field evaluation in every grid point for ev- 
ery molecule in the data set are placed in the 
CoMFA QSAR table, which therefore contains 
thousands of columns (Fig. 2.5). The analysis 
of this table by the means of standard multiple 
regression is practically impossible; however, 
the application of special multivariate statisti- 
cal analysis routines, such as PLS analysis and 
LOO cross-validation ensures the statistical 
significance of the final CoMFA equation. The 
outcome from this procedure is a cross-vali- 
dated correlation coefficient R^ {(f), which is 
calculated according to the formula 






( 2 . 1 ) 



where and y are the actual, estimated, 
and averaged (over the entire data set) activi- 



ties, respectively. The summations in Equa- 
tion 2.1 are performed over all compounds, 
which are used to build a model for the train- 
ing set. The statistical meaning of the is 
different from that of the conventionalr^; a 
value greater than 0.3 is often considered sig- 
nificant (111). 

Despite obviously successful and growing 
application of CoMFA in molecular design, 
several problems intrinsic to this methodology 
have persisted. Studies revealed that CoMFA 
results can be extremely sensitive to a number 
of factors, such as alignment rules, overall ori- 
entation, lattice placement, step size, and 
probe atom type (40, 75, 112-114). The prob- 
lem of three-dimensional alignment has been 
the most notorious among others. Even with 
the development of automated or semiauto- 
mated alignment protocols such as the Active 
Analog Approach (108, 115) or DISCO (116) 
and the opportunity to use, in some cases, the 
structural information about the target recep- 
tor (112, 117) to align molecules, in general 
there is no standard recipe as to how to align 
all molecules under consideration in a unique 
and unambiguous fashion. A QSAR analysis of 
60 acetylcholinesterase inhibitors (117) is par- 
ticularly illustrative with respect to this point. 
In that study, the combination of structure- 
based alignment and CoMFA was employed 
to obtain a QSAR model for 60 chemically di- 
verse inhibitors of acetylcholinesterase (AChE). 
The great structural diversity of the AChE in- 
hibitors, ranging from choline to decametho- 
nium, made it practically impossible to struc- 
turally align all the inhibitors in any unbiased 
way and generate a unique three-dimensional 
pharmacophore. X-ray crystallographicanaly sis 
of AChE from Torpedo californica (EC 3. 1.1. 7) 
(118), followed by X-ray determination of 
the complexes of the enzyme with three 
structurally diverse inhibitors, tacrine, edro- 
phonium, and decamethonium (119), pro- 
vided crucial information with respect to the 
orientation of these inhibitors in the active 
site of the enzyme. The crystallographic 
data indicated that each of the three inhibi- 
tors had a unique binding orientation in the 
active site of the enzyme (Eig. 2.6). Their 
natural structural alignment would probably 
never have been predicted by any of the exist- 
ing automated algorithms for ligand align- 
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Figure 2.6. Siuperposition of three inhibitors of 
AChE in the active site of the enzyme based on crys- 
tallographic structures of enzyme-inhibitor com- 
plexes. Obviously, no common pharmacophore can 
be found for these molecules. 



ment or even by the researcher's imagination 
on the basis of the ligand chemical structure 
alone. This consideration demonstrates the 
general difficulty of generating a unique and 
meaningful alignment in 3D-QSAR studies 
that leads to interpretable and predictive mod- 
els. 

The 3D alignment problem is the main 
source of ambiguity in obtaining and analyz- 
ing CoMFA results, especially in the case of 
structurally diverse compounds. However, it 
was also shown that, even if the structural 
alignment is fixed^ the resultingq^ value could 
also be sensitive to the orientation of rigidly 
aligned molecules on the user terminals (75), 
which can be explained as follows. 



The grid orientation in CoMFA is fixed in 
the coordinate system of the computer; thus, 
every time when the orientation of the molec- 
ular aggregate is changed, the size of the grid 
may change but not its orientation. The orien- 
tation of the assembled molecules therefore 
affects the placement of probe atoms, which, 
in turn, influences the field sampling process. 
This leads to the variability of the values, 
mostly attributable to the reasons outlined 
earlier. The effect of variability of q^ as a func- 
tion of molecular aggregate orientation was 
more pronounced in the case of structurally 
diverse molecules (e.g., cephalotaxine esters 
and 5-HT„ receptor ligands) than in the case 
of much less structurally diverse molecules 
(e.g., HIV protease inhibitors) (75). This effect 
may be attributed to the fact that the pattern 
of probe atom placement with respect to the 
aligned molecules changes more dramatically 
when one changes the orientation of more 
structurally diverse molecules than it does 
when the data set is composed of structurally 
similar molecules. 

In the conventional CoMFA implementa- 
tion, the steric and electrostatic fields, which 
theoretically form a continuum, are sampled 
on a fairly coarse grid. As a result, these fields 
are represented inadequately, and the results 
are not strictly reproducible. Intuitively, de- 
creasing the grid spacing may increase the ad- 
equacy of sampling, as was suggested by Cra- 
mer et al. (120). Indeed, it was shown that 
decreasing the grid spacing from 2.0 to 1.0 A 
minimized the fluctuation in the observed 
values (75). Most probably, the reason for this 
phenomenon is that the decrease in grid spac- 
ing increases the number of probe atoms, 
which in turn should raise the probability of 
placing the probe atoms in a region where the 
steric and electrostatic field changes can be 
best correlated with biological activity. How- 
ever, as was noticed by Cramer et al. (120), the 
increase in the number of probe atoms also 
increases the noise in PLS analysis and leads 
to a less statistically significant q^ (121). 

An important feature of conventional 
CoMFA routine is that it assumes equal sam- 
pling and a priori equal importance of all lat- 
tice points for PLS analysis, whereas the final 
CoMFAresult actually emphasizes the limited 
areas of three-dimensional space as important 
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for biological activity. Indeed, the deficiencies 
of conventional CoMFA routine mentioned 
earlier may be effectively dealt with by elimi- 
nating from the analyses those areas of three- 
dimensional space where changes in steric and 
electrostatic fields do not correlate with 
changes in biological activity. The g^-GRS rou- 
tine was devised (75) to eliminate those areas 
from the analysis based on the (low) value of 
the obtained for such regions individually. 
The major feature of this routine is that it 
optimizes the region selection for the final 
PLS analysis. In this regard, it is intellectually 
analogous to the GOLPE approach (74). 

3D-QSAR remains an active area of re- 
search and method development. Several re- 
cent approaches such as COMSiA (45), QSiAR 
(46), and GRIND (122) address the most noto- 
rious CoMFA problems dealing with the grid 
artifacts. However, it should be kept in mind 
that 3D-QSAR modeling is a difficult process. 
It is reasonably successful when underlying 
molecules are relatively rigid and similar, so 
that the identification of the 3D pharmaco- 
phore is straightforward. With the increased 
complexity and flexibility of molecules and a 
possibility of multiple mechanisms of binding 
with the receptor, the derivation of unambig- 
uous pharmacophore and unique alignment is 
sometimes practically impossible (as shown 
above in the case of AchE inhibitors), and ex- 
treme care is important in trying to obtain 
reproducible and validated QSAR models. 

3.2 The Descriptor Pharmacophore Concept 
and Variable Selection QSAR 

The termpharmacophore, introduced by Ehr- 
lich in the early 1900s (123), was originally 
referred to the molecular framework that car- 
ries (phoros) the essential features responsible 
for a drug's ipharmacon) activity. Nowadays, 
this term has almost the opposite meaning as 
applied to three-dimensional (3D) molecular 
structure. A 3D pharmacophore is defined as a 
collection of particular chemical features 
(functional groups) and their spatial arrange- 
ment, which define pharmacological specific- 
ity of a series of compounds (124). The phar- 
macophore concept assumes that structurally 
diverse molecules bind to their receptor site in 



a similar way, with their pharmacophoric ele- 
ments interacting with the same functional 
groups of the receptor. 

The pharmacophore concept plays a very 
important role in guiding the drug discovery 
process. Pharmacophore models help medici- 
nal chemists gain an insight into the key inter- 
actions between ligand and receptor when the 
receptor structure has not been determined 
experimentally. A pharmacophore can be used 
as a basis for the alignment rules in 3D-QSAR 
analysis for the lead compound optimization 

(125) . Furthermore, a pharmacophore can be 
directly used as the search query for 3D data- 
base mining, which is a common and efficient 
approach for discovery of lead compounds 

(126) . 

Pharmacophore identification refers to the 
computational way of identifying the essential 
3D structural features and configurations that 
are responsible for the biological activity of a 
series of compounds. It is computationally in- 
tensive, requiring searching two huge spaces: 
the available conformations for each com- 
pound and the possible correspondence (align- 
ment) between different compounds. A num- 
ber of approaches and computer programs 
have been specifically developed for pharma- 
cophore identification including, for example. 
Active Analog Approach, AAA (108, 127, 128), 
Ensemble distance geometry (129), DISCO 
(116), Chem-X (130), Catalyst/Hypo (131, 
132), Catalyst/HipHop (133, 134), and 
Apex-3D (135). 

An obvious parallel can be established be- 
tween the identification of descriptors contrib- 
uting the most to the correlation with biologi- 
cal activity, and search for pharmacophoric 
elements, which are mainly responsible for 
the specificity of drug action. Indeed, individ- 
ual pharmacophoric elements are typically 
identified in the course of experimental struc- 
ture-activity studies. Considering molecules 
as a collection of substructures, pharmaco- 
phoric elements can also be viewed as specific 
chemical features selected from all chemical 
fragments present in a molecular data set. 
Thus, the selection of specific pharmacophoric 
features responsible for biological activity is 
directly analogous to the selection of specific 
chemical descriptors contributing to the most 
explanatory QSAR model. Frequently, the 
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QSAR modeling that involves descriptor (fea- 
ture) selection is referred to as variable selec- 
tion QSAR. 

This consideration emphasizes the analogy 
between pharmacophore identification and 
variable selection QSAR. On the basis of this 
analogy, we now expand the notion of chemi- 
cal pharmacophoreto that of the more general 
descriptor pharmacophore. We shall define de- 
scriptor pharmacophore as a special subset of 
molecular descriptors (of any nature, not only 
chemical functional groups) optimized in the 
process of variable selection QSAR, to achieve 
the most significant correlation between de- 
scriptor values and biological activity. 

Similar to the common areas of application 
of chemical pharmacophores, descriptor phar- 
macophores can be applied for database min- 
ing. First, a preconstructed QSAR model can 
be used as a means of screening compounds 
from existing databases (or virtual libraries) 
for high predicted biological activity. Alterna- 
tively, variables selected by QSAR optimiza- 
tion can be used for similarity searches to im- 
prove the performance of the rational library 
design or database mining methods. The ad- 
vantage of this approach for database mining 
is that it affords not only the compound selec- 
tion but also the quantitative prediction of 
their activity. 

3.2.1 Linear Models. Variable selection ap- 
proaches can be applied in combination with 
both linear and nonlinear optimization algo- 
rithms. Exhaustive analysis of all possible 
combinations of descriptor subsets to find a 
specific subset of variables that affords the 
best correlation with the target property is 
practically impossible because of the combina- 
torial nature of this problem. Thus, stochastic 
sampling approaches such as genetic or evolu- 
tionary algorithms (GA or EA) or simulated 
annealing (SA) are employed. To illustrate one 
such application we shall consider the GA-PLS 
method, which was implemented as follows 
(136). 

Step 1. Multiple descriptors such as molec- 
ular connectivity indices or atom pair descrip- 
tors (cf. Section 2.1) are generated initially for 
every compound in a data set. 

Step 2. An initial population of 100 differ- 
ent random combinations of subsets of these 



descriptors (parents) is generated as follows. 
Each parent is described by a string of random 
binary numbers (i.e., one or zero), with the 
length (total number of digits) equal to the 
total number of descriptors selected for each 
data set. The value of one in each string im- 
plied that the corresponding descriptor is in- 
cluded for the parent, and the value of zero 
implies that the descriptor is excluded. 

Step 3. Eor every random combination of 
descriptors (i.e., every parent), a QSAR equa- 
tion is generated for the training data set by 
use of the PES algorithm (41). Thus, for each 
parent a value is obtained, and some func- 
tion of is used as a fitness function to guide 
GA. 

Step 4. Two parents are selected randomly 
and subjected to a crossover (i.e., the exchange 
of the equal length substrings), which pro- 
duces two offspring. Each offspring is sub- 
jected to a random single-point mutation, that 
is, a randomly selected one (or zero) is changed 
to zero (or one) and the fitness of each off- 
spring is evaluated as described above (cf. 
Step 3). 

Step 5. If the resulting offspring are char- 
acterized by a higher value of the fitness func- 
tion, then they replaced parents; otherwise, 
the parents are kept. 

Step 6. Steps 3-5 are repeated until a pre- 
defined convergence criterion is achieved. Eor 
the convergence criterion one can use the dif- 
ference between the maximum and minimum 
values of the fitness function. Calculations are 
terminated when this difference falls below a 
certain threshold (e.g., 0.02). 

In summary, each parent in this method 
represents a QSAR equation with randomly 
chosen variables, and the purpose of the calcu- 
lation is to evolve from the initial population 
of the QSAR equations to the population with 
the highest average value of the fitness func- 
tion. In the course of the GA-PES process, the 
initial number of members of the population 
( 100) is maintained while the average value of 
the fitness function for the whole population 
converges to a high number. The best model is 
characterized by the highest value of the fit- 
ness function as well as by specific descriptor 
selection (descriptor pharmacophore) that af- 
fords such a model. 
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3.2.2 Nonlinear Models. Most of the QSAR 
approaches assume the existence of a linear 
relationship between a biological activity and 
molecular descriptors. However, the fast col- 
lection of structural and biological data, as a 
consequence of the recent development of 
combinatorial chemistry and high throughput 
screening technologies, has challenged tradi- 
tional QSAR techniques. First, 3D methods 
may be computationally too expensive for the 
analysis of a large volume of data'; and in some 
cases, an automated and unambiguous align- 
ment of molecular structures is not achiev- 
able. Second, although existing 2D techniques 
are computationally efficient, the assumption 
of linearity in the SAR may not hold true, es- 
pecially when a large number of structurally 
diverse molecules are included in the analysis. 

These considerations provide an impetus 
for the development of fast, nonlinear, vari- 
able selection QSAR methods that can avoid 
the aforementioned problems of linear QSAR. 
Several nonlinear QSAR methods have been 
proposed in recent years. Most of these meth- 
ods are based on either artificial neural net- 
work (ANN) (50, 61, 137-142) or machine 
learning techniques (65, 143- 145). Given that 
optimization of many parameters is involved 
in these techniques, the speed of the analysis 
is relatively slow. More recently. Hirst re- 
ported a simple and fast nonlinear QSAR 
method (146), in which the activity surface 
was generated from the activities of training 
set compounds based on some predefined 
mathematical function. 

For illustration, we shall consider here one 
of the nonlinear variable selection methods 
that adopts a k-Nearest Neighbor (kNN) prin- 
ciple to QSAR [kNN-QSAR (49)]. Formally, 
this method implements the active analog 
principle that lies in the foundation of the 
modem medicinal chemistry. The kNN-QSAR 
method employs multiple topological (2D) or 
topographical (3D) descriptors of chemical 
structures and predicts biological activity of 
any compound as the average activity of k 
most similar molecules. This method can 
be used to analyze the structure-activity 
relationships (SAR) of a large number of 
compounds where a nonlinear SAR may 
predominate. 

In principle, the kNN technique is a con- 



ceptually simple, nonlinear approach to pat- 
tern-recognition problems ( 1 47) .In this method, 
an unknown pattern is classified according to 
the majority of the class labels of its k nearest 
neighbors of the training set in the descriptor 
space. Many variations of the kNN method 
have been proposed in the past and new and 
fast algorithms have continued to appear in 
recent years (148, 149). The applications of 
the kNN principle in chemistry have been 
summarized by Strouf (150). In the area of 
biology, Raymer et al. have successfully ap- 
plied a kNN pattern-recognition technique 
with simultaneous feature selection and clas- 
sification in the analysis of water distribution 
in protein structures (151). In the area of 
QSPR, Basak et al. have applied this principle, 
combined with principal component analysis 
and graph theoretical indices, in the estima- 
tion of physicochemical properties of organic 
compounds (152-155). 

The assumptions underlying the kNN- 
QSAR method are as follows. First, structur- 
ally similar compounds should have similar bi- 
ological activities, and the activity of a 
compound can be predicted (or estimated) 
simply as the average of the activities of simi- 
lar compounds. Second, the perception of 
structural similarity is relative and should al- 
ways be considered in the context of a partic- 
ular biological target. Given that the physico- 
chemical characteristics of the receptor- 
binding site vary from one target to another, 
the structural features that can best explain 
the observed biological similarities between 
compounds are different for different biologi- 
cal endpoints. These critical structural fea- 
tures can be defined as the descriptor pharma- 
cophore (DP) for the underlying biological 
activity. Thus, one of the tasks of building a 
kNN-QSAR model is to identify the best DP. 
This is achieved by the "bioactivity-driven" 
variable selection, that is, by selecting a subset 
of molecular descriptors that afford a highly 
predictive kNN-QSAR model. Because the 
number of all possible combinations of de- 
scriptors is huge, an exhaustive search of 
these combinations is not possible. Thus, a 
stochastic optimization algorithm (i.e., simu- 
lated annealing) has been adopted for an effi- 
cient sampling of the combinatorial space. Fig- 
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Figure 2.7. Flowchart of the kNN method (49). 



lire 2.7 shows the overall flowchart of the 
kNN-QSAR method, which involves the fol- 
lowing steps. 

1. Select a subset of n descriptors randomly {n 
is a number between 1 and the total num- 
ber of available descriptors) as a hypothet- 
ical descriptor pharmacophore (HDP). 

2. Validate this HDP by a standard cross-val- 
idation procedure, which generates the 
cross-validated (or q^) value for the 
kNN-QSAR model built by use of this HDP. 
The standard leave-one-out procedure has 
been implemented as follows: (i) Eliminate 
a compound from the training set. {ii) Cal- 
culate the activity of the eliminated com- 
pound, which is treated as an unknown, as 
the average activity of the k most similar 
compounds found in the remaining mole- 
cules (k is set to 1 initially). The similarities 
between compounds are calculated using 
only the selected descriptors (i.e., the cur- 
rent trial HDP) instead of the whole set of 
descriptors. {Hi) Repeat this procedure un- 
til every compound in the training set has 
been eliminated and predicted once, {iv) 



Calculate the cross-validated R^ (or q^) 
value (cf. Equation 2.1). {v) Repeat calcula- 
tions fork = 2, 3, 4, . . . , n. The upper limit 
of k is the total number of compounds in 
the data set; however, the best value is' 
found empirically between 1 and 5. The k 
that leads to the best value is chosen for 
the current kNN-QSAR model. 

3. Repeat steps 1 and 2, the procedure of gener- 
ating trial HTPs and calculating correspond- 
ingq^ values. The goal is to find the best HTP 
that maximizes the q^ value of the corre- 
sponding kNN-QSAR model. This process is 
driven by a generalized simulated annealing 
by use of q^ as the objective function. 

4 VALIDATION OF QSAR MODELS 

One of the most important characteristics of 
QSAR models is their predictive power. The 
latter can be defined as the ability of a model to 
predict accurately the target property (e.g., bi- 
ological activity) of compounds that were not 
used for model development. The typical prob- 
lem of QSAR modeling is that at the time of 
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Figure 2.8. Beware of q^! External (for the test set) presents no correlation with the "predictive" 

LOO q2 (for the training set). (Adopted from Ref. 163.) 



model development a researcher has, essen- 
tially, only training set molecules, so predic- 
tive ability can be characterized only by statis- 
tical characteristics of the training set model, 
and not by true external validation. Recent 
research demonstrates that external valida- 
tion must be made, indeed, a mandatory part 
of model development. This goal can be 
achieved by a division of an experimental SAR 
data set into the training and test sets, which 
are used for model development and valida- 
tion, respectively. 

It has been shown that the more indepen- 
dent variables are involved in MLR QSAR 
analysis, the higher the probability of a chance 
correlation between predicted and observed 
activities, even if only a small portion of vari- 
ables is included in the final QSAR equation 
(16). This conclusion is true not only for MLR 
QSAR, but also for any QSAR approach when 
the number of variables (descriptors) is com- 
parable to or higher than the number of com- 
pounds in a data set. Thus, model validation is 
one of the most important aspects of QSAR 
analysis. 

4.1 Beware of 

To validate a QSAR model, most of research- 
ers apply the leave-one-out (LOO) or leave- 
some-out (LSO) cross-validation procedures. 
The outcome from this procedure is a cross- 



validated correlation coefficient R^ (q^) (Equa- 
tion 2.1). Frequently, is used as a criterion 
of both robustness and predictive ability of the 
model. Many authors consider high q^ (for in- 
stance, q^ > 0.5) as an indicator or even as the 
ultimate proof of the high predictive power of 
the QSAR model. They do not test the models 
for their ability to predict the activity of com- 
pounds of an external test set (i.e., compounds 
that have not been used in the QSAR moctel 
development). There are several examples of 
recent publications, in which the authors 
claim that their models have high predictive 
ability without validating them by use of an 
external test set (156- 160). Some authors val- 
idate their models by the use of only one or two 
compounds that were not used in QSAR model 
development (161, 162) and still claim that 
their models are highly predictive. In contrast 
with such expectations, it has been shown that 
if a test set with known values of biological 
activities is available for prediction, there ex- 
ists no correlation between LOO cross-vali- 
dated q^ and correlation coefficient R^ be- 
tween the predicted and observed activities for 
the test set [Fig. 2.8; (46, 163)]. 

4.2 Rational Selection of Training 
and Test Sets 

As discussed earlier, to obtain a reliable (vali- 
dated) QSAR model, an available data set 
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should be divided into the training and test 
sets. Ideally, this division must be performed 
such that points representing both training 
and test set are distributed within the whole 
descriptor space occupied by the entire data 
set, and each point of the test set is close to at 
least one point of the training set. This ap- 
proach ensures that the similarity principle 
can be employed for the activity prediction of 
the test set. Unfortunately, as we shall see be- 
low, this condition cannot always be satisfied. 

Many authors use external test sets for val- 
idation of QSAR models, but do not provide 
any rationale as to how and why certain com- 
pounds were chosenfor the test set (164,165). 
One of the most widely used methods for di- 
viding a data set into training and test sets is a 
mere random selection (166, 167). Some au- 
thors assign whole structural subgroups of 
molecules to the training set or the test set 
(168,169). Another frequently used approach 
is based on the activity sampling. The whole 
range of activities is divided into bins, and 
compounds belonging to each bin are ran- 
domly (orin some regular way) assigned to the 
training set or test set (170, 17 1). These meth- 
ods (166,170,171) cannot guarantee that the 
training set compounds represent the entire 
descriptor space of the original data set, and 
that each compound point of the test set is 
close to at least one point of the training set. 

In several publications, the division of a 
data set into training and test sets is per- 
formed by use of the Kohonen's Self-Organiz- 
ing Map (SOM) (172). Representative points 
faUing into the same areas of the SOM are 
randomly selected for the training and test 
sets (173, 174). SOM preserves the closeness 
between points (points that are close to each 
other in the multidimensional descriptor 
space are close to each other on the map). 
Therefore, it is anticipated that the training 
and test sets must be scattered within the 
whole area occupied by representative points 
in the original descriptor space, and that each 
point of the test set is close to at least one point 
of the training set. The drawback of this 
method is that the quantitative methods of 
prediction use exact values of distances be- 
tween representative points; because SOM is a 
nonlinear projection method, the distances be- 
tween points in the map are distorted. 



The division of a data set into the training 
and test sets can be performed by the use of 
various clustering techniques. In Burden and 
Winkler (175) and Burden et al. (176) the K- 
means clustering algorithm (177) was used, 
and from each cluster one compound for the 
training set was randomly selected. In Potter 
and Matter (178), to select a representative 
subset from a data set, hierarchical clustering 
and the maximum dissimilarity method (179- 
181) were used. The authors showed that both 
methods choose representative subsets of 
compounds much better than the random se- 
lection. Compounds selected through use of 
the maximum dissimilarity method were used 
as training sets in 3D-QSAR studies, with all 
remaining compounds composing the test set. 
In Wu et al. (166) the Kennard-Stone (182- 
1 84) method, which is similar to the maximum 
dissimilarity method, was applied to the clas- 
sification of NIR spectra and QSAR analysis. 
The drawbacks of clustering methods are that 
different clusters contain different numbers of 
points and have different densities of repre- 
sentative points. Therefore, the closeness of 
each point of the test set to at least one point of 
the training set is not guaranteed. The maxi- 
mum dissimilarity and Kennard-Stone meth- 
ods guarantee that the points of the training 
set are distributed more or less evenly within 
the whole area occupied by representative 
points, and the condition of closeness of the 
test set points to the training set points is sat- 
isfied. The maximum distance between train- 
ing and test set points in these methods does 
not exceed the radius of the probe sphere. 

To select a representative subset of sam- 
ples from the whole data set, factorial designs 
(185, 186) and D-optimal designs (187) were 
used (166, 173, 188). Factorial designs pre- 
sume that different sample properties (such as 
substituent groups at certain positions)are di- 
vided into groups. The training set includes 
one representative for each combination of 
properties. For a diverse data set this ap- 
proach is impractical, and fractional factorial 
designs are used, in which only a part of all 
combinations is included into the training set. 
Generally, this approach does not guarantee 
the closeness of the test set points to the train- 
ing set points in the descriptor space. D-opti- 
mal design algorithms select samples that 
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maximize the |X'X| determinant, where X is 
the information (variance-covariance)matrix 
of independent variables (descriptors) (189, 
190). The points maximizing the |X'X| deter- 
minant are spanned across the whole area oc- 
cupied by representative points. They can be 
used as a training set, and the points not se- 
lected then are used as the test set (166, 173). 

In Wu et al. (166) four methods of sample 
selection (random, SOM, Kennard-Stone de- 
sign, and D-optimal design) were compared. 
The best models were built when Kennard- 
Stone and D-optimal designs were used. SOM 
was better than random selection, and D-opti- 
mal design was slightly better than the ran- 
dom selection. 

4.3 Guiding Principies of Safe OSAR 

A widely used approach to establish the model 
robustness is so-called y-randomization (ran- 
domization of response, i.e., in our case, activ- 
ities) (191). It consists of repeating the calcu- 
lation procedure with randomized activities 
and subsequent probability assessment of the 
resultant statistics. Frequently, it is used 
along with cross-validation. It is expected that 
models obtained for the data set with random- 
ized activity should have low values of oth- 
erwise, the original model should be consid- 
ered insignificant. We suggest that the 
y-randomization test is a mandatory compo- 
nent of model validation. 

Several authors have suggested that the 
only way to estimate the true predictive power 
of a QSAR model is to compare the predicted 
and observed activities of an (sufficiently 
large) external test set of compounds that 
were not used in the model development (46, 
163, 192-194). To estimate the predictive 
power of a QSAR model, we recommended use 
of the following statistical characteristics of 
the test set (163): (i) correlation coefficient R 
between the predicted and observed activities; 
(ii) coefficients of determination (195) (pre- 
dicted vs. observed activities Rq^, and ob- 
served vs. predicted activities slopes 

k and k' of the regression lines through the 
origin. We consider a QSAR model predictive, 
if the following conditions are satisfied (163): 

(2.2) 



i?2>o.6 



(2.3) 



(R^-Ro'^) 



R‘ 



< 0.1 or <0-1 (2.4) 



0.85 < 1.15 or 0.85 < 1.15 (2.5) 

The lack of the correlation between q^ and R^ 
was noted in Kubinyi et al. (46), Novellino et 
al. (192), Norinder (193), and in our recent 
publication (163), where we demonstrated 
that all of the above-mentioned criteria are 
necessary to adequately assess the predictive 
ability of a QSAR model. We suggest (163) that 
the external test set must contain at least five 
compounds, representing the whole range of 
both descriptor and activities of compounds 
included into the training set. 



5 QSAR MODELS AS VIRTUAL 
SCREENING TOOLS 

5.1 Data Mining and SAR Anaiysis 

Data mining has been of interest to research- 
ers in machine learning, pattern recognition, 
artificial intelligence, database statistics, and 
so forth for many years, and widely applied in 
science, business, and government. Now, che- . 
moinformatitians have also started to plunge 
into this field because of the increased quan- 
tity of data in the drug discovery process. Data 
mining can be defined as the process of discov- 
ering valid, novel, understandable, and poten- 
tially useful patterns in data (196, 197). Data 
mining is an interactive and iterative, multi- 
ple-step process, involving the decisions made 
by the user. It may include data collection, 
data cleaning, data engineering, algorithm en- 
gineering, algorithm running, result evalua- 
tion, and knowledge utilization (198, 199). 

Data mining methods can be generally di- 
vided into two types, unsupervised and super- 
vised. Whereas unsupervised methods seek in- 
formative patterns, which directly display the 
interesting relationship among the data, su- 
pervised methods discoverpredictive patterns, 
which can be used later to predict one or more 
attributes from the rest. 

A wide variety of supervised data mining 
methods have been applied for analyzing 
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structure-activity data sets, besides the tradi- 
tional linear regression methods. Most of 
them are nonlinear and nonparametric and 
need no statistical assumptions to apply them. 
Decision tree and rule induction methods, 
such as IDS (200), CART (201), and FIRM 
(202-204) usually use univariate splits to gen- 
erate a model in the form of a tree or proposi- 
tional logic. The inferred model is easy to com- 
prehend, but the approximation power may be 
significantly restricted by a particular tree or 
rule representation. Inductive logic program- 
mingmethods, such as GOLEM (64) and PRO- 
GOL (65), are designed to induce a model from 
the more flexible representation of first-order 
predicate logic. However, this generality 
comes at the price of significant computational 
demands. Nonlinear regression and classifica- 
tion methods, such as various neural networks 
(60-62), train a model by fitting linear and 
nonlinear combinations of basis functions to 
the combinations of the input variables. They 
may be powerful in terms of approximation, 
but they are statistically poorly characterized, 
slow (205), and difficult to interpret in chemi- 
cal terms. Example-based methods, such as 
nearest-neighbor methods (147), use repre- 
sentative examples from the database as an 
approximate model and predicate new sam- 
ples on the basis of the properties of the most 
similar examples in the model. They are as- 
ymptotically powerful for approximating 
properties, but also difficult to interpret. Eur- 
thermore, their performance is strongly de- 
pendent on a well-defined distance metric to 
evaluate distances between data points. 

Data mining of chemical databases is still 
at its very early stage. Nevertheless, as a re- 
sult of the data explosion in pharmaceutical 
industry, it is expected that data mining tech- 
niques will play an increasingly important role 
in the drug discovery process. Euture studies 
may include, for example, the definition of 
chemical space, the validation of various algo- 
rithms (206), and the representation of ex- 
tremely large virtual databases (207). 

5.2 Virtual Screening 

Although combinatorial chemistry and HTS 
have offered medicinal chemists a much 
broader range of possibilities for lead discov- 
eiy and optimization, the number of chemical 



compounds that can be reasonably synthe- 
sized, which is sometimes called "virtual 
chemistry space," is still far beyond today's 
capability of chemical synthesis and biological 
assay. Therefore, medicinal chemists continue 
to face the same problem as before: Which 
compounds should be chosen for the next 
round of synthesis and testing? Eor chemoin- 
formatitians, the task is to develop and utilize 
various computer programs to evaluate a very 
large number of chemical compounds and rec- 
ommend the most promising ones for bench 
medicinal chemists. This process can be called 
virtual screening (208) or chemical database 
searching. A large number of computational 
methods exist for virtual screening, but which 
one is chosen will depend on the information 
available and the task at hand in practice. 

A substructure search will typically be un- 
dertaken if a lead compound has been found. 
The search query will retrieve all the struc- 
tures in a database that contain the substruc- 
tures present in the lead compound that are 
believed to be important for activity (209). Ac- 
cording to graph theory, it is equivalent to 
searching a series of topological graphs for the 
existence of a subgraph isomorphism with a 
specified query graph. Subgraph isomorphism 
is an NP-complete problem (210), which 
means that for it, there are no algorithms 
whose worst-case time requirements do not 
rise exponentially with the size of the input. 
However, various backtracking algorithms 
(211-213) and partitioning algorithms (214- 
217) have been developed since the 1950s, to 
reduce the average time required for chemical 
substructure searching. Today, almost all the 
chemical database software includes the func- 
tion of substructure searching. 

A similarity search provides a way forward 
by retrieving the structures that are similar, 
but not identical, to a lead compound (94). 
Therefore, it overcomes some limitations of 
substructure search, for example, not requir- 
ing specific knowledge about the substruc- 
tures responsible for activity, and being able 
to rank the output structures according to the 
overall similarity. The search query usually 
involves a set of descriptors that collectively 
specify the whole structure of the lead com- 
pound. This set of descriptors is compared 
with the corresponding set of descriptors for 
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each compound in the database, and then a 
measure of similarity is calculated between 
them. There are a wide variety of molecular 
descriptors for similarity searching (cf. Sec- 
tion 2). Not a single set of molecular descrip- 
tors has been found as the best choice in all the 
cases. The present trend in descriptor selec- 
tion is to use combined descriptors with many 
different types. The similarity coefficients 
that are often used for measuring the similar- 
ity between two structures includes Manhat- 
tan distance, Euclidean distance, Soergel dis- 
tance, Tanimoto coefficient. Dice coefficient. 
Cosine coefficient, and so forth (218), and 
again no clear-cut winner has been found 
among them (219). Virtual screening based on 
QSAR models can serve as a powerful ap- 
proach to the design of targeted chemical li- 
braries, as illustrated in the following section. 

5.3 Rational Library Design by use of QSAR 

As discussed earlier, combinatorial chemical 
synthesis and high throughput screening have 
significantly increased the speed of the drug 
discovery process (220-222). However, it re- 
mains impossible to synthesize all of the li- 
brary compounds in a reasonably short period 
of time. For instance, 3000^ (2.7 X 10^*^) com- 
pounds can be synthesized from a molecular 
scaffold with three different substitution posi- 
tions when each of the positions has 3000 dif- 
ferent substituents. If a chemist could synthe- 
size 1000 compounds per week, 27 million 
weeks (—0.5 million years) would be required 
to synthesize all these compounds. Further- 
more, many of these compounds can be struc- 
turally similar to each other, thus making re- 
dundant the chemical information contained 
in the library. There is a need for rational li- 
brary design (i.e., rational selection of a subset 
of available building blocks for combinatorial 
chemical synthesis), so that a maximum 
amount of information can be obtained while a 
minimum number of compounds are synthe- 
sized and tested. Similarly, there is a closely 
related task in computational database min- 
ing, that is, rational selection of a subset of 
compounds from commercially available or 
proprietary databases for biological testing. 

Thus, in many practical cases, the exhaus- 
tive synthesis and evaluation of combinatorial 
libraries is prohibitively expensive, time-con- 



suming, or redundant (223). Modem rational 
approaches to the design of combinatorial li- 
braries have been explored in a recent mono- 
graph (224). Theoretical analysis of available 
experimental information about the biological 
target or pharmacological compounds capable 
of interacting with the target can significantly 
enhance the rational design of targeted chem- 
ical libraries. In many cases, the number of 
compounds with known biological activity is 
sufficiently large to develop viable QSAR mod- 
els for such data sets. These models can be 
used as a means of selecting virtual library 
compounds (or actual compounds from exist- 
ing databases) with (high) predicted biological 
activity. Alternatively, if a variable selection 
method has been employed in developing a 
QSAR model, the use of only selected variables 
can improve the performance of the rational 
library design or database mining methods on 
the basis of the similarity to a probe. This pro- 
cedure of use of only selected variables in a 
similarity search in the descriptor space is 
analogous to more traditional use of conven- 
tional chemical pharmacophores in database 
mining. 

QSAR models can be employed for rational 
design of targeted chemical libraries and data- 
base mining by predicting biologically active 
structures in virtual or actual chemical librar- 
ies (225-227). To illustrate this approach, we 
consider the design of a pentapeptide combi- 
natorial library with the bradykinin activity 
by use of a QSAR model derived for a small 
bradykinin peptide data set. Figure 2.9 shows 
the schematic diagram illustrating the tar- 
geted pentapeptide combinatorial library de- 
sign by use of the FOCUS-2D method (225, 
226). The algorithm includes the description, 
evaluation, and optimization steps. 

To identify potentially active compounds in 
the virtual library, FOCUS-2D employs sto- 
chastic optimization methods such as SA (228, 
229) and GA (230-232). The latter algorithm 
was used for targeted pentapeptide library de- 
sign as follows. Initially, a population of 100 
peptides is randomly generated and encoded 
by use of topological indices or amino acid- 
dependent physicochemical descriptors. The 
fitness of each peptide is evaluated by its bio- 
logical activity predicted from a precon- 
structed QSAR equation (see below). Two par- 
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Figure 2.9. Flowchart of the library design approach by FOCUS-2D . 



ent peptides are chosen by use of the roulette 
wheel selection method (i.e., high fitting par- 
ents are more likely to be selected). Two off- 
spring peptides are generated by a crossover 
(i.e., two randomly chosen peptides exchange 
their fragments) and mutations (i.e., a ran- 
domly chosen amino acid in an offspring is 
changed to any of 19 remaining amino acids). 
The fitness of the offspring peptides is then 
evaluated and compared with that of the par- 
ent peptides, and the two lowest scoring pep- 
tides are eliminated. This process is repeated 
for 2000 times to evolve the population. 

Design of a Targeted Library with Bradykinin 
(BK) Potentiating Activity. The results obtained 
with the FOCUS-2D and a QSAR-based pre- 
diction are shown in Figure 2.10. The position- 
dependent frequency distributions of amino 
acids in the highest scoring pentapepeptides 
are shown before (Fig. 2.10a) and after (Fig. 
2.10b, c) FOCUS-2D. To evaluate the effi- 
ciency of stochastic sampling, the entire pen- 
tapeptide library (which includes as many as 
3.2 million molecules) was also generated and 
subjected to evaluation by use of the same 
C^AR model, and the results are shown in Fig. 
2.10c. Apparently, the results after FO- 
CUS-2D and the exhaustive search were very 
similar to each other. FOCUS-2D selected the 
following amino acids: E, I, K, L, M, Q, R, V, 
and W. Interestingly, these selected amino ac- 
ids included most of those found in the two 
experimentally most active pentapeptides. 



VEWAK and VKWAP (excluded from the 
training set for the QSAR model develop- 
ment). Furthermore, the actual spatial posi- 
tions of these amino acids were correctly iden- 
tified: the first and fourth positions for V; the 
second and fifth positions for E; the third po- 
sition for W; and the second and fifth positions 
for K. More detailed analysis of these results 
(cf. Eig. 2.10b,c) may suggest which residues 
should be preferably chosen for each position 
in the pentapeptide to achieve a limited size . 
library with high predicted bradykinin activ- 
ity. 

6 CX5NCLUSIONS 

In this chapter, we have reviewed recent and 
developing trends in the field of QSAR. We 
have provided common terminology and pre- 
sented a unified concept of the QSAR ap- 
proach. We have emphasized that, regardless 
of the origin of molecular descriptors, any 
QSAR modeling exercise starts from con- 
structing a two-dimensional data array (Eig. 
2.2), which lists molecular IDs, values of the 
target (or dependent) property of each com- 
pound, and values of descriptors (independent 
variables) for each compound. We have consid- 
ered various protocols employed by QSAR 
practitioners to develop quantitative models 
of biological activity by the use of chemical 
descriptors and linear or nonlinear optimiza- 
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Figure 2. 10. Ratonal selection of building blocks for library design by use cf FOCUS -2D and a QSAR 
model for activity prediction: (a) initial population; (b) final population after FOCUS -2D; and (c) final 
population after the exhaustive search. 



tion techniques. We have particularly empha- 
sized that the true power of any QSAR model 
comes from its statistical significance and the 
model's ability to predict accurately biological 
properties of chemical compounds both in the 
training and, most important, in the test sets. 
One of the important research challenges in 
the QSPR modeling remains finding descrip- 
tor types, correlation approaches, and ade- 
quate statistical characteristics of the training 
set only, which may ensure high predictive 
power of the models. 

In conclusion, we strongly advocate rigor- 
ous validation of QSAR models before their 
practical application or interpretation. The 
practical guidelines for the development of 
statistically robust and predictive QSAR mod- 
els can be summarized as follows: 



1. Establish an SAR database through the use 
of reliable quantitative measurements of 
the target property and a preferred set of 
molecular descriptors. 

2. Divide the underlying data set into training 
and test sets through the use of diversity 
sampling algorithms. 

3. Develop training set models through the 
use of available QSAR methods or commer- 
cial software. Characterize these models 
with internal validation parameters, as dis- 
cussed in this chapter, and define the appli- 
cability domain for each model. 

4. Validate training set models through the 
use of an external test set and calculate the 
external validation parameters, as dis- 
cussed in this chapter. Ideally, repeat the 
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procedure of training and test selection and 
external validation several times to iden- 
tify the QSAR model for the smallest train- 
ing set that affords adequate prediction 
power for the biggest test set. 

5. Finally, explore and exploit validated 
QSAR models for possible mechanistic in- 
terpretation and prediction. 

In the modem age of medicinal chemistry, 
QSAR modeling remains one of the most im- 
portant instruments of computer-aided drug 
design. Skillful application of various method- 
ologies discussed in this chapter will afford 
validated QSAR models, which should con- 
tinue to enrich and facilitate the experimental 
process of drug discovery and development. 
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1 INTRODUCTION 

By historical imperative, the role of molecular 
modeling in drug design has been divided into 
two separate paradigms, one centered on the 
structure-activity problem that attempts to 
rationalize biological activity in the absence of 
detailed, three-dimensional structural infor- 
mation about the receptor, and the other fo- 
cused on understanding the interactions seen 
in receptor-ligand complexes and using the 
known three-dimensional structure of the 
therapeutic target to design novel drugs. The 
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rapid increase in relevant structural informa- 
tion, attributed to advances in molecular biol- 
ogy to generate the target proteins in ade- 
quate quantities for study, and the equally 
impressive gains in NMR (1-9) and crystallog- 
raphy (10, 11) to provide three-dimensional 
structures as weU as identify leads, have stim- 
ulated the need for design tools and the molec- 
ular modeling community is rapidly evolving 
useful approaches. The more common prob- 
lem, however, is one in which the receptor can 
only be inferred from pharmacological studies 
and little, if any, structural information is 
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available to guide in modeling. Nevertheless, 
useful information to guide the design and 
synthesis of potential novel therapeutics can 
be developed from an analysis of structure- 
activity data in the three-dimensional frame- 
woric provided by current molecular modeling 
techniques. Although most of the techniques 
and approaches described have broader appli- 
cation than shown, the examples chosen 
should be sufficient to illustrate their use. A 
number of reviews (12-18) of computer-aided 
dmg design have relevant sections covering 
portions of this chapter with different per- 
spectives and are recommended for a more 
complete overview. 

2 BACKGROUND AND METHODS 

21 Molecular Mechanics 

Molecular mechanics (19) treats a molecule as 
a collection of atoms whose interactions can be 
described by Newtonian mechanics. Because 
the mass of the nuclei is much greater than 
the mass of the electrons, one can separate 
(the Bom-Oppenheimer approximation) the 
Schrodinger equation into a product of two 
functions: one for electrons, one for nuclei. 
For the purposes of molecular mechanics, the 
electronic function, initially developed to in- 
terpret spectroscopic data, is ignored; that is, 
the charge distribution is assumed to remain 
constant during changes in the position of the 
nuclei. Because molecular mechanics is based 
cn classical physics, it cannot provide informa- 
tion about the electronic properties of mole- 
cules under study that are generally assumed 
fixed during the parameterization of the force 
field with experimental data. 

A few words about the basics of molecular 
mechanics (19, 20) may provide the elements 
of understanding for what follows. This is not 
meant to be comprehensive, but rather a sim- 
ple overview, to remind the reader of a few 
cmcial points. For a comprehensive overview 
of molecular modeling, the reader is referred 
to the excellent text by Leach (21). The inter- 
actions between atoms are divided into 
bonded and nonbonded classes. Nonbonded 
forces between atoms are based on an attrac- 
tive interaction that has a firm theoretical ba- 
sis and varies as the inverse of the 6 th power of 



the distance between the atoms. It is balanced 
by a repulsion between the electronic clouds as 
the atoms come close and this interaction has 
been represented empirically by a variety of 
functional forms: exponential, 12th power, or 
9th power of the distance between the atoms. 
The coefficients for these two interactions are 
parameterized for atom types, usually by ele- 
ment, so that the minimum of the combined 
functions corresponds to the sum of the exper- 
imental van der Wa 2 ds radii for the two atoms. 

In addition, bonded atoms are considered 
as a special case, with a "spring constant" de- 
termining the energy of deformation from ex- 
perimental bond lengths. Atoms directly 
bonded to the same atom (one-three interac- 
tions) are eliminated from the van der Waals 
list and have a special energetic term relating 
the deviation from an ideal bond angle. Atoms 
having a one-four interaction define a tor- 
sional relation that is usually parameterized 
based on the types of the four connected atoms 
defining the torsion angle. The numerous 
combinations of atom types require an enor- 
mous number of parameters to be determined 
from either theoretical (quantum mechanics) 
and/or experimental data. Simplified force 
fields in which the torsional parameters de- 
pend only on the atoms at the end of a bond 
have been developed, to give approximate ge- 
ometries for further refinement by quantum 
mechanics. 

2.1.1 Force Fields. The basic assumption 
underlying molecular mechanics is that classi- 
cal physical concepts can be used to represent 
the forces between atoms. In other words, one 
can approximate the potential energy surface 
by the summation of a set of equations repre- 
senting pairwise and multibody interactions. 
These equations represent forces between at- 
oms related to bonded and nonbonded interac- 
tions. Pairwise interactions are often repre- 
sented by a harmonic potential 
that obeys Hooke’s law (derived for a spring) 
for bonded atoms, restoring the bond distance 
to an equilibrium value b, and a van der 
Waals potential [Cjsfi - Ce(i for 

nonbonded atoms. Similarly, distortion from 
an equilibrium valence angle (0 q) describing 
the angle between three bonded atoms shar- 
ing a common atom is also penalized \V 2 Kq{Q - 
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A third class of interaction dependent on 
the dihedral angle 4> between four bonded at- 
oms is the torsional potential {K^[l + cos(</> - 
8)]} used to account for orbital delocalization 
and to compensate for other deficiencies in the 
force field. A harmonic term [V 2 K^(^ - is 
often introduced for dihedral angles 5 that are 
relatively fixed, such as those in aromatic 
rings. Coulomb's law [qiqj/iA'TreQe^ij)] is the 
simplest approach to the contribution of elec- 
trostatics to the potential V: 

v=^lK,{b-b„r + 2lK,(e-e„r 

+ - «)] 
+ 2 [Ci 2 (.i,j)/ri/^ - Ce(i,j)lnfl 

+ 29i9/(4ireoErrjj). 

A central issue is the number of different 
atom types that are used in a particular force 
field. There is always a compromise between 
increasing the number to allow for the inclu- 
sion of more environmental effects (i.e., local 
electronic interactions) vs. the increase in the 
number of parameters to be determined to ad- 
equately represent a new atom type. In gen- 
eral, the more subtypes of atoms (how many 
different kinds of nitrogen, for example), the 
less likely that the parameters for a particular 
application will be available in the force field. 
The extreme, of course, would be a special 
atom type for each kind of atomic environ- 
ment in which the parameters were chosen, so 
that the calculated properties of each molecule 
would simply reproduce the experimental ob- 
servations. One major assumption, therefore, 
is that the force constants (parameters) and 
equilibrium values of the equations are func- 
tions of a limited number of atom types and 
can be transferred from one molecular envi- 
ronment to another. This assumption holds 
reasonably well where one may be primarily 
interested in geometric issues, but is not so 
valid in molecular spectroscopy. This had led 
to the introduction of additional equations, 
the so-called "cross-terms" which allow addi- 
tional parameters to account for correlations 
between bond lengths and bond angles [Kf,o(b 



- bn){d - dihedral angles and bond an- 
gles, and so forth. Because of the lack of ade- 
quate parameterization of the more complex 
force fields that are usually specialized to one 
kind of molecule (e.g., proteins or nucleic ac- 
ids), more simplified force fields have gained 
some popularity because of their general ap- 
plicability, despite limited accuracy. 

Examples are the Tripos force field (22), the 
COSMIC force field (23), and that of White 
and BoviU (24), which uses only two atom 
types, those at the end of the bond to parame- 
terize the torsional potential rather than the 
four types of the atoms used to define the tor- 
sional angle. One has only to consider the 
number of combinations of 20 atom subtypes 
taken four at time (160,000) versus two at a 
time (400) to understand the explosion of pa- 
rameters that occurs with increased atom sub- 
types. The simplifying assumption in parame- 
terization of the torsional potential reduces to 
some extent the quality of the results (25), but 
allows the use of the simplified force fields (22) 
in many situations where other force fields 
would lack appropriate parameters. The situ- 
ation can become complicated, however. For 
example, the amide bond is normally repre- 
sented by one set of parameters, whether the 
configuration is cis or trans. Experiments 
data are quite compelling that the electronic 
state is different between the two configura- 
tions, and different parameter sets should be 
used for accurate results (Fig. 3.1). Only AM- 
BER/OPLS currently distinguishes between 
these two conformational states (26). Cer- 
tainly, the limited parameterization of simpli- 
fied force fields would not allow accurate pre- 
diction of spectra that is more reflective of the 
dynamic behavior of the molecule. 

Accurate estimates of energy may require 
accurate representation of the dynamics of 
molecules and justify derivation of the larger 
number of parameters. The new version (27) 
of the Allinger force field, MM3, has the objec- 
tive of reproducing spectral data more accu- 
rately than MM2. Much of the chemistry re- 
mains to be incorporated into appropriate 
force fields. Only recently have adequate mod- 
ifications been made to the force fields devel- 
oped for organic molecules to include some 
metals (28-31).Carlsson (32, 33) recently de- 
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Figure 3.1. Differences in 
OPLS charge distribution (top) 
between cis- and trans-isomers of 
amide bond and geometries (bot- 
tom) as calculated by ab initio pa- 
rameterization (26). 



veloped a functional form that allows elec- 
tronic d-orbitals of metals to be reasonably 
represented within molecular mechanics. 

Because different force fields may use dif- 
ferent mathematical representations of the 
forces between atoms and the details of their 
parameterization will in general differ also, it 
is unwise to use parameters derived for one 
foree field to replace missing parameters in 
another. One often hears of a "balanced" pa- 
rameter set that reproduces well the phenom- 
ena under consideration, but which is inade- 
quate for other applications. A comparison by 
Burkert and Allinger (19) shows the different 
van der Waals (VDW) potentials used in sev- 
eral cf the popular force fields, and the situa- 
tion has not improved significantly in the in- 
tervening years. Because of other differences 
in parameters and functional forms of the 
equations used in the rest of the individual 
force fields, these quite different approaches 
to the VDW potential give excellent results 
wten used in the correct combination. Indis- 
criminant combination of one part of a force 
field with another derived independently 
would lead to considerable divergence in the 
calculated results from those obtained by ex- 
perimental observation. 

The most extreme difference between force 
fields arises in the method by which the hydro- 



gen bond is included. Because atoms involved 
in a hydrogen bond are often closer than the 
sum of their VDW radii, they must be handled 
in a special manner. Several force fields have 
special functional forms with angular depen- 
dency that not only have special VDW param- 
eters, to ensure that the close approach of the 
atoms involved is calculated correctly, but 
that the angular distribution observed for hy- 
drogen bonds is also reproduced. Hagler et al. 
(34) used an amide hydrogen with a zero VDW 
radius for hydrogen bonding and a slightly 
greater nitrogen radius to give a correct amide 
hydrogen bond distance. The charges on the 
atoms involved (including the amide hydro- 
gen) are adjusted to give an appropriate bal- 
ance of VDW repulsion and dipole attraction. 
Clearly, the method for handling the electro- 
static interaction is an integral part of each 
force field and cannot be modified indepen- 
dently. 

2.1.2 Electrostatics. The most difficult as- 
pect of molecular mechanics is electrostatics 
(35-38). In most force fields, the electronic dis- 
tribution surrounding each atom is treated as 
a monopole with a simple coulombic term for 
the interaction. The effect of the surrounding 
medium is generally treated with a continuum 
model by use of a dielectric constant. More 
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detailed approaches with distributed multi- 
pole representations of the electron distribu- 
tion (39, 40) and/or efforts to deal with dielec- 
tric inhomogeneity through solution of the 
Poisson equation are clear improvements and 
have become routine in many studies. Other 
difficulties arise in dealing with macromolec- 
ular systems, given that the electrostatic in- 
teraction is long ranged (1/r) and the interac- 
tions cannot be arbitrarily terminated with 
distance. Electrostatic interactions range 
from those operating only at very short dis- 
tances that are nonspecific (dispersiveinterac- 
tions, dependency) to those operating at 
very long distances with a high degree of spec- 
ificity (charge-charge interactions, depen- 
dency). 

Dispersive Interactions (r~^). These are at- 
tributed to interaction of induced dipoles 
within the electron clouds as molecules come 
in proximity and are responsible for the at- 
tractive part of the nonbonded van der Waals 
interaction. 

Dipole -Dipole Interactions (r~^). Because of 
the nonsymmetrical distribution of electrons 
between atoms of different size and electro- 
negativity, bonds have associated permanent 
dipoles. The interaction energy between two 
of these dipoles depends on their relative ori- 
entation. This is basically the interaction un- 
derlying the phenomenon of the hydrogen 
bond. Although some force field authors use a 
special hydrogen bonding potential with an 
orientation dependency, simple partial charge 
representations combined with appropriate 
VDW parameters can reproduce the effect as 
weU (34). 

Charge-Dipole Interactions (r~^). A charge 
interacting with a permanent dipole can be 
handled simply by considering the charge in- 
teracting with the two charges at the poles of 
the dipole. Alternatively, if the distance be- 
tween the poles of the dipole is small compared 
with that between the centers of the ion and 
the dipole, then the potential energy $ can be 
approximated as 

= Cja cos 0/r^ 

where e is the charge of ion, ja is the dipole 
moment, © is the angle between the vector 



connecting the center of the dipole with 
charge and dipole orientation, and r is the dis- 
tance between the center of the ion and the 
center of the dipole. 

Charge-Charge Interactions (r~^). The en- 
ergy of interaction between two charges 
and ^2 is given by Coulomb's law: 

47re7'i5! 

where r ^_2 is the distance separating charges 
and e is the dielectric constant of the medium. 

To evaluate atom-atom interactions using 
Coulomb's law, the concept of net atomic 
charge is invoked. This amounts to represent- 
ing charge as a point, a monopole, and is an 
artificial construct. Nevertheless, this is the 
common method. Recent improvements in cal- 
culating an appropriate set of point charges, to 
accurately reproduce the molecular electro- 
static potential derived by quantum calcula- 
tions, have been reported (41). 

In an effort to increase the quality of elec- 
trostatic representations, dipole and higher 
multipole moments have been used. There are 
advantages in these more accurate represen- 
tations, with a relatively small computational 
increase attributed to the reductions in dis- 
tances over which the higher moments have to 
be summed, although they do require addi- 
tional effort in the derivation of the parame- 
ters for the higher moments themselves. A 
good example is the distributed multipole 
model of electrostatics derived for peptides. A 
review by Williams (42) discusses the prob- 
lems of deriving a distributed multipole ex- 
pansion of charge representation that accu- 
rately reproduces the molecular electrostatic 
potential derived from quantum calculations. 
Comparisons were made between atomic mul- 
tipoles, bond dipole, and restricted bond dipole 
models. Williams finds that a model for the 
electrostatic potential based on bond dipoles 
supplemented with monopoles (for ions) and 
atomic dipoles (for lone pairs) is most useful. 
Dipole-dipole energy converges much faster 
than monopole-monopole energy. Molecular 
charge at any desired position in a molecule is 
not a physically measurable quantity; one can 
only calculate a delocalized electron probabil- 
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ity distribution from quantum theory. Clearly, 
the more complex the representation, the 
moie accurately one can approximate the 
quantum mechanical results, and the more re- 
alistic should be the results obtained. One 
complexity of electrostatics is the long dis- 
tances over which interactions occur. Appro- 
priate means of truncating the long-range 
forces to maintain the accuracy of simulations 
are necessary (43-45) and progress in better 
approximations has been reported (46). The 
difficulties with cutoff schemes were demon- 
strated (47, 48) by significant variations in the 
behavior of a 17-residue helical peptide simu- 
lated with explicit waters, using various elec- 
trostatic schemes and by studies (49) of a pen- 
tapeptide in aqueous ionic solution (50). In 
both cases, the Ewald approximation in which 
periodicity is assumed (which allows summa- 
tion over much longer distances) gave supe- 
rior results (47-49). 

2.1. 2.1 The Dielectric Problem and Solva- 
tion. Although methods of localizing charge 
just described may give reasonable results, the 
use cf Coulomb's law with a dielectric con- 
stant, a scaling factor related to the polariz- 
ability of the medium between the charges, is 
clearly of concern. The dielectric at the molec- 
ular level is neither homogeneous nor contin- 
uous, nor even well defined, and thus violates 
the basic assumption of Coulomb's law. Al- 
though the use of a low, uniform dielectric is 
mote nearly correct in dynamical simulations 
where all solute and solvent atoms are explic- 
itly included, a variety of comparisons of ex- 
perimental data with the results of calculation 
by use of a simplified solvent model have led to 
the realization that much better approaches 
are needed. Initial efforts (51) led to the pro- 
posal of a variable dielectric (1/R or 1/472). 
Mae recently, the use of approaches that 
model the inhomogeneity of the dielectric at 
the interface between the solute and solvent 
by use of the Poisson-Boltzman equation have 
shown considerable promise (52, 53). An alter- 
native approach that uses the mirror charge 
approximation has been described by Schaefer 
and Froemmel (54). Excellent reviews (35-38) 
of the electrostatic problem have appeared, to 
vdiich the reader is referred. 

Much effort has been given to simple con- 
tinuum models of solvation to explain the ori- 



gin of solvent effects on conformational equi- 
libria and reaction rates. The current status of 
such efforts, as well as simulations to rational- 
ize solvation effects, has been reviewed by 
Richards et al. (55). There are two general ap- 
proaches to the continuum models. The first is 
reaction field theory (Bell, Kirkwood, On- 
sager) that follows the classical treatment of 
Debye-HuckeL The solvent is considered in 
terms of charge distribution, polarizability, 
and dielectric constant. The solvation energy 
is determined simply by considering the solute 
as a point dipole that interacts with the in- 
duced charge distribution in the solvent (On- 
sag er reaction field). An extension by Si- 
nangolou in the 1960s partitioned solvation 
energy into cavity formation, solvent- solute 
interaction, and the "free volume" of the sol- 
ute. The logical extension of this approach is 
scaled-particle theory (56), in which the free 
energy of formation of a hard- sphere cavity of 
diameter a2 in a hard- sphere solvent of diam- 
eter a and number density p is scaled to the 
exact solution for small cavity sizes. Alter- 
natively, the virtual charge approach used a 
system of effective and virtual charges inter- 
acting in the gas phase. The Hamiltonian of 
the system is modified to include an imagi- 
nary particle, a "solvaton" with an opposite 

charge for each of the solute atoms and 
solved by the SCE procedure. These contin- 
uum models have met with limited success 
(trends and relative effects of solvation 
can be predicted), although highly specific 
molecular interactions, such as those involv- 
ing hydrogen-bonding groups, cannot be 
accommodated. 

In the equation for calculating affinity of 
a drug for a receptor, the ligand is solvated 
either by the receptor or by the solvent. This 
competition means that accurate determina- 
tion of the free energy of solvation is impor- 
tant in understanding differences in affini- 
ties. Solvation free energy (Gg^i) can be 
approximated by three terms: Q the for- 
mation of a cavity in the solvent to hold the 
solute; and Gpoj, the interaction be- 

tween solute and solvent divided be- 
tween VDW and electrostatic forces, respec- 
tively: 
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^sol ^cav ^vdW ^pol* 

There are four theoretical approaches to the 

problem: 

1. Scaled Particle Theory (56) 

The essence of the scaled particle theory 
is that formation of a cavity in a fluid re- 
quires work. The theory for hard spheres 
has been well developed from statistical 
mechanics, and the work, W{R, p), can be 
calculated as follows: 

W(R,p)/kT = ~ln(l ~ y) + (3y/l - y)R 
+ [(3y/l -y) + 9/2*(y/l - yfW 

+ yPRVrkT 

where y = irpo-^l^ R = crg/cT]^, erg is the 
diameter of the hard-sphere solute, o-^ is 
the diameter of the solvent, and p is the 
number density of the fluid (N/V). 

Because this theory includes no interac- 
tion between solvent and solute (i.e., only 
Gcav is calculated), effective volumes for 
nonspherical compounds with interactive 
groups are normally calibrated from experi- 
ment. This is one way to deal with the energy 
of interaction between solvent and solute. 
For further discussion, see Pollack (57). 

2. Charge Image (or Virtual Charge) Method 
(54) (Method for Gp^i calculation) 

This model replaces the solute-contin- 
uum model with one in which a system of 
charges derived from the solute and virtual 
charges in the adjacent space interact in 
the gas phase. A set of mirror charges re- 
flected at the dielectric boundary are cre- 
ated and used in the calculation of the 
electrostatics. 

3. Boundary Element Method (58) (Method 
for Gpoi calculation) 

In this approximation, the system is 
modeled by calculating the appropriate 
surface charges at the dielectric boundary. 
This is similar to fitting charges at atomic 
centers to reproduce the molecular electro- 
static potential. For a quantum-mechani- 
cal equivalent, Tomasi et al. (59) intro- 
duced a charge distribution on the surface 
of a cavity of realistic shape to introduce 



the solvation term in the Hamiltonian of 
the solute. The charge distribution on the 
surface of the cavity depends on the sol- 
ute's electric field, which is affected in turn 
by polarization from the cavity's surface. 
An iterative QM procedure is used to ob- 
tain the perturbation term. Cramer and 
Trular have developed AMSOL to include a 
solvent approximation in calculations of 
molecular systems. The approach has been 
calibrated by comparison of theoretical and 
experimental solvation free energies for 
numerous molecular species (60). 

4. Poisson-BoltzmannEquation (53) (Method 
for Gpoi calculation) 

Generalization of the Debye-Huckel 
theory leads directly to the Poisson-Boltz- 
mann equation that describes the electro- 
static potential of a field of charges with 
dielectric discontinuities. This equation 
has been solved analytically for spherical 
and elliptical cavities, but must be solved 
by finite-difference methods on a grid for 
more complicated systems. One exciting 
advance in this area is the development of 
an approximate equation for the reaction 
field acting on a macromolecular solute, at- 
tributed to the surrounding water and ions 
(61). By combining these equations with 
conventional molecular dynamics, solva- 
tion free energies were obtained similar to 
those with explicit solvent molecules, at lit- 
tle computational cost over vacuum simu- 
lations. This implies that a more nearly 
correct solution to the electrostatics prob- 
lem might minimize the solvation problem. 
Other approaches to evaluations of 
have recently appeared in the literature. 
Still et al. (62) estimated Gcav + ^vdw by 
the solvent-accessible surface area times 
7.2 cal/mol/A^. Gp^i is estimated from the 
generalized Bom equation. Effective solva- 
tion terms have been added (63, 64) to mo- 
lecular mechanics force fields to improve 
molecular dynamics simulations without 
the cost of modeling explicit solvent. Zau- 
har (65) combined the polarization-charge 
technique with molecular mechanics to ef- 
fectively minimize a tripeptide in solvent. 
One final refinement may be necessary in 

some situations: the inclusion of electric polar- 
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izability, for example, by inclusion of induced 
dipoles, or distributed polarizability (66) in 
the electrostatic representation of the model. 
Kuwajima and Warshel (67) recently exam- 
ined the effects of this refinement in modeling 
crystal structures of polymorphs of ice. Such 
models including polarizability have been pre- 
viously shown useful for predicting the prop- 
erties of crystalline polymorphs of polymers 
hy Sorensen et al. (68). Caldwell et al. (69) 
included implicit nonadditive polarization en- 
ergies in water-ion outcomes, resulting in im- 
proved accuracy. At the semiempirical level of 
quantum theory, Cramer and Truhltir (70-73) 
added solvation and solvent effects on polariz- 
ability to AMI, with impressive agreement be- 
tween experimental and calculated solvation 
energies (60). Rauhut et al. (74) also intro- 
duced an arbitrarily shaped cavity model by 
use cf standard AMI theory. 

2.1. 2.2 The "Hydrophobic” Effect. Water 
has been the nemesis of solvation modeling 
because of its rather unique thermodynamic 
properties, as reviewed by Frank (75) and 
StiUinger (76). The biochemical literature dis- 
cusses at length "hydrophobic effects" (77). 
This effect is not "hydrophobic" at all because 
the enthalpic interaction of nonpolar solutes 
with water is favorable. This, however, is 
counterbalanced by an unfavorable entropic 
interaction that is interpreted as an induced 
structuring of the water by the nonpolar sol- 
ute. Water interacts less well with the nonpo- 
lar solute than it does with itself because of the 
lack of hydrogen-bonding groups on the sol- 
ute. This creates an interface similar to the 
air-water interface, with a resulting surface 
tension attributed to the organization of the 
hydrogen-bonded patterns available. This is 
the so-called iceberg formation around nonpo- 
lar solutes in water, first suggested by Frank 
and Evans. Studies by both molecular dynam- 
ics (78-80) and Monte Carlo simulations (81) 
support this interpretation (76), although 
there is still considerable controversy in inter- 
pretation of experimental data (82). 

2.1. 2.3 Polarizability. The traditional ap- 
proaches in molecular mechanics have ex- 
cluded the effects of charge on induced dipoles 
and multibody effects. This approximation be- 
comes a serious limitation when dealing with 
charged systems and molecules like water that 



are highly polar. A recent paper (83) from the 
Kollman group described nonadditive many- 
body potential models to calculate ion solva- 
tion in polarizable water with good agreement 
with experimental observation. It was neces- 
sary to include a three-body potential (ion- wa- 
ter-water) in the molecular dynamics simula- 
tion of the ionic solution to obtain quantitative 
agreement with solvation enthalpies and coor- 
dination numbers. Inclusion of a bond-dipole 
model with polarizability in molecular dynam- 
ics simulations has given excellent agreement 
in predicting physical properties of polymers 
by Sorensen et al. (68). 

A novel approach based on the concept of 
charge equilibration has been suggested by 
Rappe and Goddard (84) that allows the inclu- 
sion of polarizabilities in molecular dynamics 
calculations. 

2.1.3 The Potential Surface. The set of 

equations that describe the sum of interac- 
tions between the ensemble of atoms under 
consideration is an analytical representation 
of the Born-Oppenheimer surface, which de- 
scribes the energy of the molecule as a func- 
tion of the atomic positions. Many important 
properties of the molecule can be derived by 
evaluation of this function and its derivatives. 
For example, setting the value of the first de-. 
rivative to zero and solvingfor the coordinates 
of the atoms leads one to minima, maxima, 
and saddlepoints. Evaluation of the sign of the 
second derivative can determine which of the 
above have been found. It is a straightforward 
procedure to calculate the vibrational fre- 
quencies from the force constants by evalua- 
tion of the eigenvalues of the secular determi- 
nant (the mass-weighted matrix; see textbook 
on vibrational spectroscopy). Gradient meth- 
ods for the location of energy minima and 
transition states are an essential part of any 
molecular modeling package. It is essential to 
remember, however, that minimization is an 
iterative method of geometrical optimization 
that is dependent on starting geometry, unless 
the potential surface contains only one mini- 
mum (a condition not found for any system of 
sufficient complexity to be of real interest). 

The ability to locate both minima and tran- 
sition points enables one to determine the 
minimum energy reaction path between any 
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two minima. In the case of flexible molecules, 
these minima could correspond to conformers 
and the reaction path would correspond to the 
most likely reaction coordinate. One could es- 
timate the rate of transition by determination 
of the height of the transition states (the acti- 
vation energy) between the minima. Elbers 

(85) developed a new protocol for the location 
of minima and transition states and applied it 
to the determination of reaction paths for the 
conformational transition of a tetrapeptide 

(86) . Huston and Marshall (87) used this ap- 
proach to map the reaction coordinates of the 
a- to Sio'heliceil transition in model peptides. 

Despite the limitations that curtail exact 
quantitative applications, molecular mechan- 
ics can provide three-dimensional insight as 
the geometric relations between molecules are 
adequately represented. Electrical field poten- 
tials can be calculated and compared to give a 
qualitative basis for rationalizing differences 
in activity. Molecular modeling and its graph- 
ical representation allow the medicinal chem- 
ist to explore the three-dimensional aspects of 
molecular recognition and to generate hypoth- 
eses that lead to design and synthesis of new 
ligands. The more accurate the representation 
of the potential surface of the molecular sys- 
tem under investigation, the more likely that 
the modeling studies will provide qualitatively 
correct solutions. 

2.1.3. 1 Optimization. The search for the 
optimal solution to a complex problem is com- 
mon to many areas in science and engineering 
and does not have a general solution. Numer- 
ous approaches to this problem, which is gen- 
erally referred to as optimization, have been 
used in chemistry: most commonly, distance 
geometry, molecular dynamics, stochastic 
methods such as Monte Carlo sampling, and 
systematic, or grid, search. Most rely on min- 
imization, often combined with a stochastic 
search. Minimization algorithms have been 
thoroughly characterized with regard to their 
convergence properties, but, in general only 
locate the closest local minima to the starting 
geometry of the system. A stochastic approach 
to starting geometries can be combined with 
minimization to find a subset of minima in the 
hope that the global minimal is contained 



within the subset and can readily be identified 
by its potential value compared with that of 
the other minima. 

2.1.3.2 Potentiai Smoothing. One ap- 
proach to global optimization that has shown 
promise is potential smoothing (88). This ap- 
proach uses a mathematical transformation to 
smooth the multidimensional potential en- 
ergy surface of a molecule, reducing the high 
frequency complexity of the surface and mak- 
ing it much easier to search for minimum en- 
ergy conformations. This concept was first 
used to deform the conformational potential 
energy surface in the diffusion equation 
method (DEM) of Piela and co workers (89). 
Search procedures will not confront multiple 
local minima on the deformed surface. If the 
procedure is reversed iteratively, then one can 
trace the path back into a region that lies near 
the global minimum of the undeformed poten- 
tial surface. Ponder et al. (88, 90) improved 
the procedure for tracing back from one par- 
tially deformed surface to the next by includ- 
ing a local search procedure to limit detection 
of false minima. 

One of the best known benchmark prob- 
lems for conformational search involves the 
determination of the low energy conforma- 
tions of the highly flexible cycloheptadecane 
(91, 92). This system continues to serve as a 
test for newly developed search methods (93). 
Although not a particularly large molecule, 
this system is a challenge because of its flexi- 
bility and the close energy spacing of the lower 
lying minima. Extensive analysis through a 
variety of search methods has located ex- 
actly 263 minima within 3.0 kcal/mole of the 
purported global minimum. The potential 
smoothing search (PSS) (88) was dramatically 
effective at locating many of the lowest energy 
structures for cycloheptadecane. Although the 
global minimum for cycloheptadecane was not 
located, the second lowest energy structure 
was located and differed by only 0.01 kcal/ 
mole. Based on its MM2 vibrational frequen- 
cies, the global minimum is entropically disfa- 
vored relative to all of the minima located by 
the smoothing procedure. The PSS method 
was also applied to obtain the minimum en- 
ergy conformation of the TM helix dimer of 
glycophorin A (GpA) (94), previously solved by 
solution NMR spectroscopy (95). 
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2,1, 3.3 Genetic Algorithm. Another ap- 
proach to global optimization is the genetic 
algorithm. This approach is based on biologi- 
cal evolution and is analogous to natural selec- 
tion (96-98).In applications to computational 
chemistry, evolution on the computer has 
been shown to be an efficient approach to 
global optimization, although because of sam- 
pling issues, there is no guarantee that the 
global optimum has been found in any partic- 
ular application (99). 

2. 1.3. 3.1 Characteristics of the Genetic Al- 
gorithm. In analogy to natural selection, the 
parameters to be optimized are encoded in a 
bit string and strung together in a "chromo- 
some." Each chromosome in the population 
represents a particular genotype or solution to 
the problem under consideration (i.e., a spe- 
cific set of values for the parameters that de- 
termine the configuration of the system under 
study). The values of the parameters have to 
be decoded for the "fitness" of a particular ge- 
notype to be evaluated. Once the fitness of 
each chromosome in the population has been 
evaluated, then the more "fit" members are 
allowed to reproduce, mutate, or cross over 
with other members of the parent population 
to generate a new daughter population. This 
process is repeated until the fitness of the pop- 
ulation converges, or until the available com- 
puter cycles are consumed. 

2. 1.3. 3. 2 Example o /Conformational Analy- 
sis. The simplifying assumption of rigid geom- 
etry is used to reduce the computational 
complexity of the model problem of conforma- 
tional analysis. The elimination of variables is 
rationalized based on the high energy cost as- 
sociated with bond length distortions and the 
ability to accommodate bond angle deforma- 
tions by a reduced set of van der Waals radii. 
To represent the conformation of a molecule, 
one needs only to specify the values of the tor- 
sional angles associated with rotatable bonds. 
One can assign a set number N of bits, 6 for 
example, to represent 2^ values for the tor- 
sional angles. Each set of 6 bits can be consid- 
ered a "gene" and crossover allowed only at 
gme boundaries, if desired. Thus, the confor- 
mation of a molecule can be encoded as a set of 
torsional genes. The actual coordinates of the 
molecule corresponding to each genotype 
must be generated for the fitness function F, 



in this case internal energy, to be numerically 
evaluated by molecular mechanics. Each chro- 
mosome in the population is evaluated for its 
internal energy and a subset of the more fit 
selected for reproduction. The degree of limi- 
tation on reproductive fitness is analogous to 
the selective pressure brought to bear on a 
population (i.e., selection of the fittest). This is 
a parameter that can be varied in most GA 
programs and one must balance selective pres- 
sure against maintaining some variation in 
the population for evolution to occur (to avoid 
being trapped in a local minimum). The set of 
chromosomes to be reproduced can be based 
on some arbitrary criteria (the top 50%), all 
those with fitness at least half that of the most 
fit chromosome detected, or the fitness scaled 
in some way and chromosomes reproduced in 
proportion to their scaled fitness. 

Given a subset of chromosomes to repro- 
duce, several operations analogous to evolu- 
tion are invoked. Eirst is mutation, where a 
certain number of randomly selected bits are 
mutated from 0 to 1 or vice versa in the daugh- 
ter chromosome. This would allow for changes 
in the settings of one or more torsional angles. 
A certain number of pairs of chromosomes are 
also selected for crossover and one or more 
locations between genes (if specified) are ran- 
domly selected and the two pieces derived 
from each parent chromosome swapped, to 
generate two or more novel chromosomes. 
This would allow for different subsets of con- 
formations to be combined; this provides a 
mechanism for concerted changes or jumps 
over barriers to find minima that would be 
difficult to sample by mutation alone. This 
would appear to be the feature that provides 
the analogous behavior to simulated anneal- 
ing in efficient searching of parameter space. 
In this case, however, the search is more di- 
rected by the selective pressure of increasing 
the "fitness" or facing elimination from the 
population. In other words, each new genera- 
tion should have eliminated a significant por- 
tion of the less fit members of the previous 
generation and propagated those torsional 
values that generate good local conforma- 
tional states. 

2. 1.3. 3. 3 Schema and the Building Block 
Hypothesis. Once a population of good local 
substates has been established, then crossover 
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can probe the combination of these subconfor- 
mations that have positive interactions lead- 
ing to more fit progeny. In the jargon of com- 
puter science, the subpattern of I’s and O’s 
giving a preferred subconformation would be a 
schema (or building block). According to the 
most accepted theory, the building block hy- 
pothesis, the genetic algorithm initially de- 
tects biases toward fitness in lower order 
(fewer identical bits) schemas and converges 
on this part of search space (the entire set of 
bit strings). By combining information from 
lower order schema through crossovers, biases 
in higher order schemas are detected and 
propagated. 

The strong convergence property of the ge- 
netic algorithm is a major attraction. Given 
sufficient members of the population and suf- 
ficient evolutionary time (number of genera- 
tions), then one can expect convergence if the 
fitness function is based on the optimal com- 
bination of locally optimized substructures. 
Some fitness functions are termed "decep- 
tive," in that low order schemas are not 
present in higher order schemas and their 
propagation slows detection of the more fit 
higher order schemas. Another problem arises 
when the population size is too small or the 
selection factor too high. Then, the genetic al- 
gorithm can magnify a small sampling error 
and prematurely converge in a local optimum. 

2,13.3.4 Mutations and Encoding. There 
are different ways to encode binary numbers 
by bit strings and these can have some influ- 
ence on the impact of mutation. Traditional 
binary encoding requires that all bits be 
changed for some cases if the digital value is to 
be simply incremented. This causes erratic be- 
havior near an optimum, with mutation and 
mutations in higher order bits having more 
effect than in lower order bits. 

2. 1.3. 3. 5 Crossovers and Encoding. In our 
example, we indicated that one might want to 
separate the bit string into genes correspond- 
ing to torsional angles because the gene has a 
coherent meaning in the context of the prob- 
lem. If one restricts crossovers to the junctions 
between genes, then the coherence of the con- 
formation of molecular fragments is preserved 
and one is more likely to make a successful 
crossover producing more fit offspring. There 
are methods such as random-key encoding 



(97) to generalize the process of crossovers 
without requiring customized crossover oper- 
ators that are problem specific, although this 
is beyond the scope of this chapter. 

2. 1.3. 3. 6 Examples of Applications to Bio- 
chemical Problems. McGarrah and Judson 
(100) explored the impact of different param- 
eters setting on the ability of the GAto explore 
the conformational space of cyclo(Gly6). Each 
residue was represented by four angles, each 
with a string of four bits (1/16 of range). A 
selection fraction of 50% was used, which 
eliminated the lower half in fitness from re- 
production. Population sizes of 10, 50, and 100 
were tested. Each group was divided into four 
niche populations with communication be- 
tween groups. Eocal minimization was per- 
formed for each chromosome before evalua- 
tion. They concluded that it was of little use to 
examine a population size of less than 100 
members for the 24 variables examined. As 
soon as convergence in the average is detected 
in a population, it should be cross-fertilized 
from another niche or GA evolution should 
terminate. It is a clear example of a hybrid 
approach, in which GA does a rough search for 
minima and local minimization to find the 
closest local minimum. 

Judson et al. (101) examined the use of a 
genetic algorithm to find low energy conform- 
ers of 72 small to medium organic molecules 
(1-12 rotatable bonds) whose crystal struc- 
tures were known. They used the elitist strat- 
egy, in which the best individual from each 
generation is propagated without modifica- 
tion. A population size of 1 0 times the number 
of the nonring dihedral angles being varied 
was chosen. Each molecule was allowed to run 
for 10,000 energy evaluations, or until the 
population was bit converged. In a few cases, 
conformers with lower energies than those ob- 
served in the crystal structure were found. A 
comparison with CSEARCH in SYBYL (Tri- 
pos, Inc.) was made, but the differences in ef- 
ficiencies found were not compelling. In only 9 
of the 72 cases examined, did the GA find its 
best conformer had energy greater than the 
crystal structure, with the largest deviation 
being only 0.8 kcal/mol. 

The GA approach has also been applied to 
the docking problem with dihydrofolate reduc- 
tase, arabinose binding protein, and sialidase 
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(98).A typical run took minutes on a worksta- 
tion and the predicted conformations agreed 
with those observed crystallographicallyin all 
cases. Meadows and Hajduk (102) used exper- 
imental constraints with a GA algorithm to 
dock biotin to stepavidin. Judson et al. (101) 
also reported docking of flexible molecules 
into the active sites of thermolysin, car- 
boxypeptidase, and dihydrofolate reductase. 
In 9 of the 10 cases examined, the GA found 
conformations within 1.6 A root-mean- square 
(rms) cf the relaxed crystal conformation. 

This approach has also been used in the 
PRO_LIGAND de novo design program (103) 
to optimize the structure of ligands for a bind- 
ing site. Aset of candidate structures was gen- 
erated and then crossover between molecular 
fragments used to optimize the predicted 
binding mode. This is similar to the SPLICE 
program of Ho and Marshall (104)that evolves 
hgands with more favorable interactions with 
a given site. 

Payne and Glen (105) studied several dif- 
ferent aspects of molecular recognition with 
genetic algorithms. Conformations and orien- 
tations were determined which best-fit con- 
straints such as inter- or intramolecular dis- 
tances, electrostatic surface potentials, or 
volume overlaps with up to 30 degrees of free- 

dOTL 

2.1.4 Svstematic Search and Conforma- 
tional Analysis. Because of the convoluted na- 
ture cf the potential energy surface of mole- 
cules, minimization usually leads to the 
nearest local minimum ( 106, 107) and not the 
global minimum. In addition, many problems 
in structure-activity studies require geometric 
solutions that may not be at the global mini- 
mum cf the isolated molecule. To scan the po- 
tential surface with some surety of complete- 
ness, systematic, or grid, search procedures 
have been developed. To understand the 
strengths and limitations of this approach, 
seme cf the algorithmic details must be con- 
sidered. These are discussed in depth in a re- 
view by Beusen et al. (108). 

2. 1.4.1 Rigid Geometry Approximation. A 
simplifying assumption that is usually in- 
veted to reduce the computational complexity 
of the problem through elimination of vari- 
ables is that of rigid geometry. The rationale is 
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Figure 3.2. Calibrated set of van der Waals radii 
for peptide backbone for use with rigid geometry 
approximation (109). Usual radii shown in paren- 
theses. (Zarbonyl cafbon not modified. 



based on the high energy cost associated with 
bond-length distortions and the ability to ac- 
commodate bond-angle deformations by a re- 
duced set of VDW radii. This approach is com- 
patible with problems where one is most 
interested in eliminating conformations that 
are energetically unlikely (i.e., sterically disal- 
lowed) because of VDW interactions, which 
cannot be relieved by bond-angle deformation. 
A successful application requires that one cal- 
ibrate an appropriate set of VDW radii for the 
particular application area. lijima et al. (109) 
calibrated such a set (Fig. 3.2) for peptide ap- 
plication by comparison with experimental 
crystallographic data from proteins and pep- 
tides. 

2. 1,4.2 Combinatoriai Nature of the Prob- 
iem. Using the rigid geometry assumption, 
one can analyze the combinatorial complexity 
of a simplified approach to the problem with 
some ease. Let us assume a molecule (Fig. 3.3) 
of N atoms with T torsional degrees of freedom 
(i.e., rotatable bonds). For each torsional de- 
gree of freedom T, explored at a given angular 
increment in degrees A, there are 360/A values 
to be examined for each T. This means that 
(360/A)^ sets of angles, each describing a 
unique conformation, need to be examined for 
steric conflict. For each conformer, the start- 
ing geometry will have to be modified by ap- 
plying the appropriate transformation matri- 
ces to different subsets of atoms to generate 
the coordinates of the conformation. For each 
conformation,iV(iV’ - l)/2 distance determina- 
tions will have to be calculated to a first ap- 
proximation (this does not exclude bonded at- 
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Figure 3.3. Schematic diagram 
of molecule with N atoms and T 
rotatable bonds. 





oms and atoms bonded to the same atom from 
the check, which is necessary) and checked 
against the allowed sum of VDW radii for the 
two atoms involved. The number of VDW com- 
parisons V is given by 

(360/A)^XiV(iV- l)/2 

It should be clear that the VDW comparisons 
are the rate-limiting step by their sheer 
number, and any algorithmic improvement 
that reduces the number of such checks or 
enhances the efficiency of performing such 
checks is of value. 

2.1 .4.3 Pruning the Combinatorial Tree. 

From this simplified analysis, a systematic 
search of other than the smallest molecules at 
a coarse increment would appear daunting. A 
hybrid approach with a coarse grid search fol- 
lowed by minimization has been successfully 
used to locate minima. There are a number of 
algorithmic improvements over the "brute 
force" approach that enhances the applicabil- 
ity of the systematic search itself. To under- 
stand these improvements, some concepts 
need to be defined. First is the concept (1 10) of 
aggregate, a set of atoms whose relative posi- 
tions are invariant to rotation of the T rota- 
tional degrees of freedom. n-Butane is divided 
into aggregates as an illustration (Fig. 3.4). 

In this simple example, the atoms in an ag- 
gregate are all either directly bonded or have a 
1-3 relationship (i.e., are related by a bond 
angle). Because of the rigid geometry approx- 
imation, their relative positions are fixed. At- 
oms contained within the same aggregate do 
not, therefore, have to be included in the set of 
those that undergo VDW checks for each con- 



formation. For linear molecules, there are n - 
1 bonds and the number of 1-3 interactions 
depends on the valence of the atom. This sim- 
plification leads to a reduction of the number 
ofVDWchecks by the factor N(N - l)/2, which 
is multiplied by the number of conformations. 

How can one reduce the number of confor- 
mations that have to be checked? Here the 
concept of construction becomes useful. One 
constructs the conformations in a stepwise 
fashion, starting with an initial aggregate and 
adding a second aggregate at a given torsional 
increment for the torsional variable T that is 
applied to the rotatable bond connecting the 
two. If any pair of atoms overlaps for that in- 
crement, then one can terminate the construc- 
tion because no addition operation will relieve 
that steric overlap. In effect, one has trun- 
cated the combinatorial possibilities that 
would have included that subconformation; 
that is, one has pruned the combinatorial tree. 

2.1. 4,4 Rigid Body Rotations. If one con- 
structs the molecule stepwise by the addition 
of aggregates, then one has two sets of atoms 
to consider. First are those in the partial mol- 
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Figure 3.4. Decomposition of re-butane molecule 
into aggregates. 
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Figure 35. Distance between atoms (1-7) and 
atom 10 separated by a single rotatable bond T can 
be described with a transformation of the equation 
of a circle describing the locus of atom 10 as bond T 
is rotated. Notice that distance D between any atom 
(1-7) and center of circle of rotation of atom 10 that 
is on axis of rotation is fixed, regardless of value of T. 

ecule (set A), previously constructed, that 
have been found to be in a sterically allowed 
partial conformation. For each possible addi- 
tion cf the aggregate, the atoms of the aggre- 
gate (set B) must be checked against those in 
the partial molecule. If one uses the concept of 
a rigid body rotation, then one can describe 
the locus of possible positions of any atom in 
set B as a circle whose center lies on the axis of 
rotation (the interconnectingbond) at a dis- 
tance along the axis that can be calculated. 
The formula for a circle can be transformed to 
represent the possible distances between the 
atom b in set B and any atom a in set A as 
shown in Fig. 3.5. An equation with scalar co- 
efficients that describes the variable distance 
between two atoms as a function of a single 
torsional variable was derived (111), which 
has a discriminant whose evaluation can be 
used to determine whether atom a and atom b 
wiU: 

• he in contact, despite changesin the value of 
the torsional rotation of the aggregate, 
which implies that the current partial con- 
formation has to be discarded, given that 
there is no possible way to add the aggregate 
that is sterically allowed; 

• never come in contact for any value of the 
torsional rotation, so that this pair of atoms 
can be removed from consideration regard- 
ing this aggregate; or 

• come in contact for some values of the tor- 
sional rotation that can be calculated for 
that pair and that removes a segment of the 



Figure 3.6. Scheme for combining systematic 
search with analytical solution for closure. Bonds 
indicated by arrows were systematically scanned, 
whereas those indicated by A were analytically de- 
termined. Dotted bond can represent either chemi- 
cal bond or experimental distance determination 
(NOE, etc.). 

torsional circle from consideration for other 
atom pairs. If all segments of the torsional 
circle are disallowed by combinations of the 
angular requirements of different atom 
pairs, then the partial conformation of the 
molecule is disallowed because further con- 
struction is not feasible. As a first approxi- 
mation, this removes a degree of torsional 
freedom from the problem, reducing T to 
T - 1 torsional degrees of freedom. At a 10" 
torsional scan, an approximate reduction in 
computational complexity cf a factor of 36 
results. 

2, 1.4, 5 The Concept and Exploitation of, 
Rings. Realization that many of the relevant 
constraints in chemistry can be expressed as 
interatomic distances, VDW interactions, nu- 
clear Overhauser effect constraints, and so 
forth allows use of the concept of a virtual ring 
in which the constraint forms the closure 
bond. Small rings up to six members can be 
solved analytically (112), so that one can 
search the torsional degrees of freedom asso- 
ciated with a constraint until only five remain 
and then solve the problem analytically (Fig. 
3.6). The torsional angles for those degrees of 
freedom are no longer sampled on a grid, thus 
removing the problem of grid tyranny, in 
which valid conformations are missed by the 
choice of increment and starting conforma- 
tion. This approach is then a hybrid because 
only part of the conformational space is 
searched with regular torsional increments. It 
is, however, much more efficient to solve a set 
of equations than search 5 torsional degrees of 
freedom. 
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Figure 3.7. (a) Two-dimensional (Ram- 
achandran) plot of energy vs. backbone tor- 
sional angles, @ and T', for A/'-acetyl-valine- 
methylarnide. (b) Three-dimensional plot 
of energy vs. torsional angles, T", and ^1, 
for N-acetyl- valine -methylamide. 

2. 1.4.6 Conformational Clustering and 
families. In a congeneric series, the corre- 
spondence between torsional rotation vari- 
ables is maintained as one compares mole- 
cules, and a direct comparison of the values 
allowed for one molecule with those allowed 
for another is meaningful. Two- (2D) or three- 
dimensional (3D) plots (Fig. 3.7) of torsional 
variables against energy often provide consid- 
erable insight into the difference in conforma- 
tional flexibility between two molecules. Such 
a plot of the peptide backbone torsional angles 




'F is known as a Ramachandran plot. When 
more than three torsional variables become 
necessary to define the conformation of the 
molecule under consideration, then multiple 
plots become necessary to represent the vari- 
ables. Unless special graphical functions are 
included in the software, then correlations be- 
tween plots become difficult, given that each 
plot is a projection of a multidimensional 
space. One approach to this problem is to use 
cluster analysis programs to identify those 
values of the multidimensional variables that 
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Figure 3.8. Cycloalkane rings and 
number of local minima found by vari- 
ous search strategies, n, number of con- 
formers with MM2 (117); parentheses, 
number of conformers with MM3 (117); 
#, number of conformers within 25 kJ/ 
mol of global minima [MM2 (92)]; *, 
number of minima found within 3 kcal/ 
mol of global minima (115). 



are adjacent in N-space. The clusters of con- 
formers that result have been referred to as 
families. A member of a family is capable of 
being transformed into another conformer be- 
longing to the same family without having to 
pass over an energy barrier; that is, the mem- 
bers cf a family exist within the same energy 
valley. 

Because of the combinatorial nature of sys- 
tematic search, one is often faced with large 
numbers of conformers that have to be ana- 
lyzed. For some problems, energetic consider- 
ations are appropriate and conformers can be 
clustered with the closest local minimum, pro- 
viding to a first approximation an estimate of 
the entropy associated with each minima by 
the number of conformers associated, in that 
they can come from a grid search that approx- 
imates the volume of the potential well. A sin- 
gle conformer, perhaps the one of lowest en- 
ergy, can be used with appropriately adjusted 
error limits in further analyses as representa- 
tive of the family. 

2,1. 4.7 Conformational Analysis. Although 
interaction with a receptor will certainly per- 
turb the conformational energy surface of a 
flexible ligand, high affinity would suggest 
that the ligand binds in a conformation that is 
not exceptionally different from one of its low 



energy minima. Mapping the energy surface of 
the ligand in isolation to determine the low 
energy minima will, at the very least, provide a 
set of candidate conformations for consider- 
ation, or as starting points for further analy- 
ses. The problem of finding the global mini- 
mum on a complicated potential surface is 
common to many areas, and lacks a general 
solution. Minimization procedures locate the 
closest local minimum depending on the start- 
ing conformation. Several strategies have de- 
veloped to map the potential surface and lo- 
cate minima. For an excellent overview of the 
different approaches, the reader is referred to 
the surveys by Leach (113) and by Burt and 
Greer (114). Stochastic methods such as 
Monte Carlo have been advocated (115) for 
conformational analysis and their usefulness 
demonstrated on carbocyclic ring systems (91, 
115-121) (Fig. 3.8). Molecular dynamics can 
be used to explore the potential energy sur- 
face, often with simulated annealing to help 
overcome activation-energy barriers, but ex- 
ploration is concentrated in local minima and 
duplication of the surface explored is con- 
trolled by Boltzmann's law. A systematic, or 
grid, search samples conformations in a regu- 
lar fashion, at least in the parameter space 
(usually torsional space) that is incremented. 
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Comparisons of a variety of methods were 
made on cycloheptadecane by Saunders et al. 
(91) and it was concluded that the stochastic 
method was most efficient. In one of the few 
independent comparisons of the effectiveness 
of these procedures, Boehm et al. (122) studied 
the sampling properties on the model system 
caprylolactam, a nine-membered ring, and 
concluded that systematic search was both in- 
efficient and ineffective at finding the minima 
found by the other methods when the number 
of conformers examined was limited. 

2. 1.4.8 Other Implementations of System- 
atic Search. Numerous other implementa- 
tions of systematic, or grid, search programs 
exist in the literature and those with protein 
applications have been reviewed by Howard 
and Kollman (123), whereas those for small or 
medium sized molecules are included in the 
reviews by Burt and Greer (114) and by Leach 
(113). One of the more widely used programs 
in organic chemistry, MACROMODEL, has a 
search module (124) coupled to energy mini- 
mization for conformational analysis. MAC- 
ROSEARCH has been developed by Beusen et 
al. ( 125) to generate the set of conformers con- 
sistent with experimental NMR data and used 
to determine the conformation of a 15-residue 
peptide antibiotic. 

2.1 .5 Statistical Mechanics Foundation (1 26). 

To understand the relationships between the 
simulation methods and the desired thermo- 
dynamic quantities, a short review of the ma- 
jor concepts of statistical mechanics may be in 
order. This is not meant to be comprehensive, 
but rather to remind the reader of the relevant 
ideas. 

The set of configurations generated by the 
Monte Carlo simulation generates what J. 
Willard Gibbs would call an "ensemble," as- 
suming that the number of molecules in the 
simulation was large and the number of con- 
figurations was also large. This ensures that 
the possible arrangements of molecules that 
are energetically reasonable have been ade- 
quately sampled. One is often interested in the 
statistical weight Wof a particular observable. 
For example, a particular conformation of a 
solute molecule, say, the staggered rotamer of 
ethane, could be compared with another con- 
former, the eclipsed rotamer, in a simulation 



with solvent. If more configurations of the sur- 
rounding solvent molecules of equivalent en- 
ergy were available to the staggered than to 
the eclipsed, then the staggered would have a 
higher statistical weight. From the inscription 
on Boltzmann’s tomb, we all recall that S = k 
In W, where S is the entropy and k is Boltz- 
mann’s constant. Thus, we have a link be- 
tween statistics and thermodynamics. W in 
this case would be the number of configura- 
tions associated with the particular conforma- 
tion of ethane under consideration divided by 
the total number of configurations sampled. 
This would have to be weighted by their en- 
ergy, of course, unless the distribution was al- 
ready Boltzmann weighted, as happens when 
one uses the Metropolis algorithm (127). 

Another way of stating this is that the prob- 
ability Pi of a particular configuration is 
proportional to its Boltzmann probability di- 
vided by the Boltzmann probability of all the 
other configurations or states: 

N 

Pi = exp( - BJkT) / 2 exp( - EJkT) 

i = 1 

The denominator in this equation has been 
given a special name, partition function, often 
symbolized by Z, which is derived from the 
German Zustandsumme (sum over states). 
The successive terms in the partition function 
describe the partition of the configurations 
among the respectives states available. One 
can express the thermodynamic state func- 
tions of an ideal gas in terms of the molecular 
partition function Z as follows: 

S = ^ In W = kN\nZ/N + U/T + kN 

where N is the number of molecules and U is 
the internal energy. From this and the as- 
sumption of an ideal gas pV = NkT, the Gibbs 
free energy G = U - TS + leads to 

G = -NkTlnZ/N 

and similarly, the Helmholtz free energy A = 
U - TS leads to the expression 
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A = ~kT\nZ^/N\ 

all of which may be more familiar if expressed 
in terms of enthalpy, H = +pV, 

In summary, by simulating a relevant sta- 
tistical sample of the possible arrangements of 
molecules when interacting, one can derive 
the macroscopic thermodynamic properties by 
statistical analysis of the results. In this case, 
one is deriving the partition function not by 
theoretical analysis of the quantum states 
available to the molecule, but through simula- 
tion. In other words, the average properties 
are valid if the Monte Carlo or molecular dy- 
namics trajectories are ergodic, that is, con- 
structed such that the Boltzman distribution 
law is in accord with the relative frequencies 
with which the different configurations are 
sampled. (An ergodic system is by definition 
one in which the time average of the system is 
the same as the ensemble average.) A basic 
concept in statistical mechanics is that the 
system will eventually sample all configura- 
tions, or microscopic states, consistent with 
the conditions (temperature, pressure, vol- 
ume, other constraints) given sufficient time. 
In other words, a trajectory of sufficient length 
(in time) would sample configuration space. 

2.1.6 Molecular Dynamics (37, 126, 128). 

Molecular dynamics is a deterministic process 
based on the simulation of molecular motion 
by solving Newton's equations of motion for 
each atom and incrementing the position and 
velocity of each atom by use of a small time 
increment. If a molecular mechanics force 
field cf adequate parameterization is available 
for the molecular system of interest and the 
phenomenon under study occurs within the 
time scale of simulation, this technique offers 
an extremely powerful tool for dissecting the 
molecular nature of the phenomenon and the 
details of the forces contributing to the behav- 
ior cf the system. 

In this paradigm, atoms are essentially a 
collection of billiard balls, with classical me- 
chanics determining their positions and veloc- 
ities at any moment in time. As the position of 
one atom changes with respect to the others, 
the forces that it experiences also change. The 
forces on any particular atom can be calculated 



by evaluation of the energy of the system using 
the appropriate force field. From physics, 

F ~ ma = —SVl8r = mb‘^r/8t^ 

where Fis the force on the atom, m is the mass 
of the atom, is the acceleration, V is the po- 
tential energy function, and r represents the 
cartesian coordinates of the atom. Using the 
first derivative of the analytical expression for 
the force field allows the calculation of the 
force felt on any atom as a function of the po- 
sition of the other atoms. 

2, 1.6.1 integration. In this simulation, we 
use numerical integration; that is, we choose a 
small time step (smaller than the period of fast- 
est local motion in the system) such that our 
simulation moves atoms in sufficiently small in- 
crements, so that the position of surrounding 
atoms does not change significantly per incre- 
mental move. In general, this means that the 
time increment is on the order of 10~ s (1 fem- 
tosecond). This reflects the need to adequately 
represent atomic vibrations that have a time 
scale of 10“^^ to 10“^^ s. For each picosecond of 
simulation, we need to do 1000 iterations of the 
simulation. For each iteration, the force on each 
atom must be evaluated and its next position 
calculated. For simulations involving molecules 
in solvent, sufficient solvent molecules must be 
included, so that the distance from any atom in 
the solute to the boundary cf the solvent is 
larger than the decay of the intermolecular in- 
teraction between the solute and solvent mole- 
cules. This requires several hundred solvent 
molecules for even small solutes, and the com- 
putations to do a single iteration are sufficiently 
large that simulationsof more than severalhun- 
dred picoseconds for proteins with explicit sol- 
vent are still rare. Efforts to increase the time 
step and thus allow for longer simulations with- 
out sacrificing the accuracy of the methodology 
are under investigation. Combination of normal 
mode calculations with explicit numerical inbe- 
gration allows time steps up to 50 ps for model 
systems (129). A similar approach has been 
shown effective by Schlick and Olson (130) in 
modeling supercoilingof DNA 

Let us attempt a rough trajectory through 
molecular dynamics. We have a system of N 
atoms obeying classical Newtonian mechan- 
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ics. In such a system, we can represent the 
total energy as the sum of kinetic energy 
and potential energy 

■Etot (0 = ^kin (^) + V’pot (0 

where the potential energy is a function of the 
coordinates, Vi = /(r^) for atoms i to N and 
represents cartesian coordinates of atom i; and 
the kinetic energy depends on the motion of the 
atoms: 

£ki„ (« = y \ m , v ? ft) 

where is the mass of atom i and Vi is the 
velocity of atom i. 

The energy undergoes constant redistribu- 
tion because of the movements of the atoms, re- 
sulting in changes in their positions on the po- 
tential surface and in their velocities. At each 
iteration (^ — + At), an atom i moves to a new 
position [ri{t) ri{t -I- A^)], and it experiences a 
new set of forces. The basic assumption is that 
the time step At is sufficiently small that the 
position of atom i at f + A/ can be linearly ex- 
trapolated from its velocity at time t and the 
acceleration resulting from the forces felt by 
atom i at time t. If At is long enough for the 
atoms surrounding atom i to change their posi- 
tion so that the forces felt by atom i will change 
during At, then the approximation is not valid 
and the simulation will deviate from that ob- 
served with a shorter At. After each atom is 
moved, the forces on the first atom based on the 
new positions of the other N - 1 atoms can be 
recalculated and a new iteration begun. Several 
algorithms exist for numerical integration. The 
ones by Verlet and Gear are in common use, 
with the one by Verlet being computationally 
more efficient (126). A variant of the Verlet al- 
gorithm in common use is called the leapfrog 
algorithm. The calculation of the velocity is done 
at / - A#/2, whereas the calculation of the force 
occurs at t to derive the new velocity at f = A^/2. 
In other words, 

Vi(t -f Atm - v,(^ - Atm + Fi{t)At!Mi. 

The atomic position of atom i is calculated by 
adding the incremental change in position. 



Viit + A^/2) . AT, io the original position 
By staggering the evaluation of the velocity 
and force calculations by Atj2, an improve- 
ment in the simulation performance is ob- 
tained. 

2.1. 6.2 Temperature. For simulations that 
can be compared with experimental results, 
one must be able to control the temperature of 
the simulation. The temperature of a system is 
a function of the kinetic energy, 

T(t) = Ey^^mi^Nk 

where k is Boltzmann's constant. 

One can perform molecular dynamics sim- 
ulations, at a constant temperature T^, by 
scaling all atomic velocities at each step 
by a factor t derived from 

bT(t)/m = [T,- T(f)]/f 

where is the desired temperature. 

2.1. 6.3 Pressure and Volume. Depending 
on the simulation that one desires to accom- 
plish, either the pressure or volume must be 
maintained constant. Constant volume is the 
easiest to perform because the boundaries of 
the system are maintained with all molecules 
confined within those boundaries and the 
pressure allowed to change during the simula- 
tion. 

2.1.7 Monte Carlo Simulations. The Monte 
Carlo method (126) is based on statistical me- 
chanics and generates sufficient different con- 
figurations of a system by computer simula- 
tion to allow the desired structural, statistical, 
and thermodynamic properties to be calcu- 
lated as a weighted average of these properties 
over these configurations. The average value 
(X) of the property X can be calculated by the 
following formula: 

N 

<X> = 2 ^iexp( - Ei/kT)V' 

i = 1 

N 

^eM-EJkT) 

i - 1 
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Figure 3.9. Schematic diagram of simulation with periodic boundary conditions in which adjacent 
ceUs are generated by simple translations of coordinates. 



where N is the number of configurations, is 
the energy of configuration i, k is Boltzmann's 
constant, and T is temperature. 

If we have sufficiently sampled the possible 
arrangements of molecules in the simulation 
and have an accurate method to calculate 
their energy E, then the above formula will 
give a Boltzmann weighted average of the 
property X. 

In practice, one must compromise the num- 
ber of molecules in the simulation and/or the 
number of configurations calculated to con- 
serve computer cycles. Two essential tech- 
niques that are utilized are periodic boundary 
conditions and sampling algorithms, which we 
discuss separately. 

Although it is important to minimize the 
number of molecules in either Monte Carlo or 
molecular dynamics simulations for computa- 
tional convenience, surface effects at the in- 
terface between the simulated solvent and the 
surrounding vacuum could seriously distort 



the results. To approximate an "infinite" liq- 
uid, one can surround the box of molecules by 
simple translations to generate periodic im- 
ages. Each atom in the central box has a set of 
related molecules in the virtual boxes sur- 
rounding the central one (Fig. 3.9). The en- 
ergy calculations for pairwise interactions 
consider only the interaction of a molecule, or 
its "ghost," with any other molecule, but not 
both. In practice, this is accomplished by lim- 
iting pairwise interactions to distances less 
than one-half the length of the side of the box. 
Real concerns often arise regarding conver- 
gence of electrostatic terms because of the lin- 
ear dependency on distance. 

For any large nontrivial system, the total 
number of possible configurations is beyond 
comprehension. Consider a set of protons in a 
magnetic field: the magnetic moments can be 
either aligned with or opposed to the magnetic 
field. For only 50 protons, there are 2^® com- 
binations, which is a large number. For a 
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small cyclic pentapeptide, there are poten- 
tially 36^*^ conformations if one considers a 10” 
scan of the torsional variables Clearly, 

some of these are energetically unreasonable 
because the conformation requires overlap of 
two or more atoms in the structure. Monte 
Carlo simulations are successfully performed 
by sampling only a limited set of the energeti- 
cally feasible conformations, say, 10® out of 
10^®® theoretical possibilities. The reason for 
this success is that the Monte Carlo schemes 
sample those states that are statistically most 
important. One could sample all states, calcu- 
late the energy of each, and then Boltzmann- 
weight its contribution to the average. Alter- 
natively, one can ignore those states that are 
energetically high so that they contribute lit- 
tle, if any, weight to the average, and concen- 
trate on those of low energy. In other words, 
we look only where there are reasonable an- 
swers energetically. This is called importance 
sampling, which is the key to the Monte Carlo 
procedure. 

One aspect shared by Monte Carlo meth- 
ods and molecular dynamics is the ability to 
cross barriers. In the case of Monte Carlo, 
barrier crossing occurs both by random se- 
lection of variables and by acceptance of 
higher energy states on occasion. Both 
methods require an equilibration period to 
eliminate bias associated with the starting 
configuration. When one considers ran- 
domly filling a box with molecules with arbi- 
trary choices for position and orientation, it 
should be obvious that most examples would 
result in high energy, especially if the den- 
sity of such a simulation is made to resemble 
that of a liquid in which adjacent molecules 
are often in VDW contact. High energy con- 
figurations contribute very little to the prop- 
erties we are trying to evaluate because they 
are Boltzmann weighted. It is, therefore, ex- 
tremely inefficient to randomly calculate 
configurations. One needs procedures, often 
referred to as importance sampling, that se- 
lectively calculate configurations that will 
be representative of allowed states. In fact, if 
one can guarantee that the energy of the 
configurations actually has a Boltzmann dis- 
tribution, then one can simply average the 
properties. In practice, this has been accom- 
plished by an algorithm suggested by Me- 



tropolis et al. (127). One essentially uses a 
Markov process in which the current config- 
uration becomes the basis for generating the 
next. 

1. A molecule in the current configuration is 
chosen at random and its degrees of free- 
dom randomly varied by small increments. 

2. The energy of the new configuration is 
evaluated and compared with that of the 
starting configuration. 

3. If the new energy is lower, the new config- 
uration is accepted and becomes the basis 
for the next random perturbation. 

4. If the energy is higher, £?(new) > 
£J(old), then a random number between 0 
and 1 is generated and compared with 
exp{“[£J(new) - E(old)]}/kT. If the num- 
ber is less, then the configuration is ac- 
cepted and the process continues by gen- 
erating a new configuration. If the 
number is greater, then the configuration 
is rejected and the process resumes with 
the old configuration. 

In this way, configurations of lower en- 
ergy are accepted and the system eventually 
"minimizes" to sample the higher populated 
lower energy configurations; at the same 
time, higher energy configurations are in- 
cluded but only in proportion to their Boltz- 
mann distribution, which is clearly a func- 
tion of temperature of the simulation. 
Because the configurations occur with a 
probability depending on their energy and 
proportional to the Boltzmann distribution, 
one can simply average thermodynamic 
properties over this distribution of configu- 
rations, 

N 

<X> = 1/N 2 Xi 

i = 1 

where the sum covers the N configurations 
generated. Because one often does not know 
an appropriate starting configuration, the 
initial part of the run may be used to "min- 
imize," or equilibrate the system, and only 




2 Background and Methods 






® ^ 




Figure 3.10. Estimation of 
difference in affinity (AAG) 
of the two anions Cl“ and 
Br“ for the cryptand SC24 
[(a) structural formula; (b) 
schematic of complex formed 
with halide ion] as the pa- 
rameters for Cl“ are slowly 
mutated into those for Br- 
in water (- - -) as weU as in 
the complex ( — ). Used with 
permission (138). 



the latter part of the simulation analyzed 
once the configurational energy has stabi- 
lized. 

A useful application has combined Monte 
Carlo sampling with variable temperatures 
(simulated annealing) to encourage barrier 
crossing to optimize the docking of ligands 
into active sites. Random displacements of 
rigid body translation and rotation (6 degrees 
of freedom) and of internal torsional rotations 
in a substrate within the binding site cavity 
woe performed with Metropolis sampling and 
a temperature program. This procedure repro- 
duced the cry stallographically observed struc- 
ture of the complex for several test cases (131). 

2.1.8 Thermodynamic Cycle Integration 
(132-1 34). Thermodynamic cycle integration 
is an approach that allows calculation of the 
free energy difference between two states. In 
this method, one takes advantage of the state- 
function nature of a thermodynamic cycle and 
eliminates the paths of the simulation with 
long time constants (e.g., formation of a com- 
plex requiring diffusion). As an example, the 
difference in affinity of two ligands (L and M) 
for the same enzyme or receptor R is described 
by the following thermodynamic cycle: 



R + L 
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RL 


— ^ 
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AA2 
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Because the thermodynamic values of the two 
states do not depend on the path between the 
states, one can write the following equation: 

AAA = difference in affinity of L and M for R 
= AA2 - AAl = A A4 - AA3 

By simulating the mutation of L into M, paths 
A3 and A4, one can avoid the long simulation 
required for diffusion of the ligands, paths A1 
and A2, into the receptor. One simply incre- 
mentally modifies the potential functions rep- 
resenting ligand L to those representing li- 
gand M during the course of the simulation, 
making sure that the perturbations are intro- 
duced gradually and that the surrounding at- 
oms have time to relax from the perturbation 
(Fig. 3.10). Either Monte Carlo ( 1 35) or molec- 
ular dynamics simulations can utilize this 
technique. Many interesting applications have 
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appeared in the literature (132,1 34, 136,137). 
Their success appears directly related to sam- 
pling problems and minimal perturbation of 
the ligand to ensure equilibration. 

2.1.9 Non-Boltzmann Sampling. There are 
equivalent molecular dynamics and Monte 
Carlo procedures that allow one to sample re- 
gions of configuration space that are not min- 
ima, transition states, for example. One can 
generate a Monte Carlo trajectory for a system 
Ey that has energetics similar to that of the 
Boltzmann system E,, with sampling in the 
region associated with a transition barrier by 
subtracting a potential V, to reduce the bar- 
rier: 

Eq — Ey V. 

Alternatively, one may want to obtain mean- 
ingful statistics for a rare event without over- 
sampling the lower energy states. This can be 
accomplished by adding a potential W, which 
is zero for the the interesting class of configu- 
rations and very large for all others (Fig. 3. 1 1): 

Eo = Ey+W 

The details of these sampling procedures that 
allow one to focus on the aspect of the problem 
of interest are the subject of a review by Bev- 
eridge (133). Application of this approach to 
determining conformational transitions in 
model peptides (137, 139, 140)are exemplified 
in the work of Elber’s group on helix-coil (85, 
86, 141), the Brooks group on turn-coil (142- 
146), and Huston and Marshall and Smythe et 
al. ( 147, 148)on helical transitions in peptides. 

2.2 Quantum Mechanics: Applications 
In Molecular Mechanics 

Detailed discussion of quantum mechanics 
( 149)is clearly beyond the scope of this review, 
and its applications to molecular mechanics 
and modeling will be briefly summarized. Mo- 
lecular mechanics is based on the laws of clas- 
sical physics and deals with electronic interac- 
tions by highly simplified approximations 
such as Coulomb's law. AH forces operating in 
intermolecular interactions are essentially 
electronic in nature. Any effort to quantitate 
those forces requires detailed information 
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Figure 3.11. Schematic diagrams of methods for 
modifying the potential surface to allow adequate 
sampling during simulations. 



about the nuclear positions and the electron 
distribution of the molecules involved. At con- 
siderable computational cost, quantum me- 
chanics provides information about both nu- 
clear position and electronic distribution. 
Molecular mechanics is built on the assump- 
tion that electronic interactions can be ade- 
quately accounted for by parameterization. 
Although most of the systems of interest in 
biology are too large for the direct application 
of quantum mechanics, quantum mechanics 
has at least three essential roles to play in drug 
design (149): (1) charge approximations, (2) 
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Figure 3.1 2. Different approaches 
to localization of chaige used in elec- 
trostatic models, (a) Atom-centered 
monopole; (b) atom-centered dipole; 
and (c) atom-centered quadrapole. 



characterization of molecular electrostatic po- 
tentials, and (5) parameter development for 
molecular mechanics. 

2.2.1 Parameterization of Charge. Esti- 
mates of charges in molecular mechanics can 
be derived, in general, by application of one of 
the many different quantum chemical ap- 
proaches, either ab imtio or semiempirically. 
Quantum mechanical methods are available 
fer calculating the electron probability distri- 



butions for all the electrons in a molecule and 
then partitioning those distributions to yield 
representations for the net atomic charges of 
atoms in the molecule, either as atom-cen- 
tered charges or as more complex distributed 
multipole models (39, 42) (Fig. 3.12). 

2.2. 1.1 Atom-Centered Point Charges. In 
the Mulliken population analysis, all the one- 
center charge on an atom is assigned to that 
atom, whereas the two-center charge is di- 
vided equally between the two atoms in the 
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overlap (even if the electronegativities of the 
two atoms are quite dissimilar). The sum is 
the gross atomic population, and the net 
atomic charge is simply this plus the nuclear 
charge. The result is very sensitive to the basis 
set (the number of atomic orbitals) used. De- 
spite poor fit of the molecular electrostatic po- 
tential derived with point charges to the ab 
initio electrostatic potential, or that derived 
from a distributed multipole analysis (150), 
widespread use continues because they do re- 
flect chemical trends and are reportedly com- 
patible with known electronegativities. In ad- 
dition, this option is commonly available in 
software packages. Unfortunately, poor repre- 
sentation of the electric field surrounding the 
molecule results from use of atom-centered 
monopole models (42), even when more care- 
ful methods are used to distribute the charge. 

2.2,1, 2 Methods to Reproduce the Molecu- 
lar Electrostatic Potential (MEP). The electro- 
static potential surrounding the molecule that 
is created by the nuclear and electronic charge 
distribution of the molecule is a dominant fea- 
ture in molecular recognition. Williams re- 
views (42) methods to calculate charge models 
to accurately represent the MEP as calculated 
by ab initio methods by use of large basis sets. 
The choice between models (monopole, dipole, 
quadrapole, bond dipole, etc.. Fig. 3.12) de- 
pends on the accuracy with which one desires 
to reproduce the MEP. This desire has to be 
balanced by the increased complexity of the 
model and its resulting computational costs 
when implemented in molecular mechanics. 

The first problem is to select points where 
the MEP is to be evaluated and eventually fit- 
ted, the position of the shell outside the VDW 
radii of the atoms in the molecule, and the 
spacing of grid points on that shell. Sampling 
too close to the nuclei gives rise to anomalies 
because the potential around nuclei is always 
positive. Singh and Kollman (151) report the 
use of four surfaces at 1.4, 1.6, 1.8, and 2.0 
times the VDW radii, with a density of one to 
five points per A^. This paradigm was reported 
to give an adequate sampling to which the fit- 
ted charges were fairly insensitive, at least at 
the higher values. An improved procedure, the 
restrained electrostatic potential fit (RESP), 
was developed by Bayly et al. (41) to enhance 
transferability of the resulting point charges. 



Williams (42) derived a procedure to derive 
the best fit to a given MEP with a defined set of 
monopoles, dipoles, and so forth. 

Typically, fragments of molecules of inter- 
est are analyzed by ab initio techniques to gen- 
erate their MEPs that are the reference for 
parameterization of charge. Besler et al. (152) 
reported fitting of atomic charges to the elec- 
trostatic potentials calculated by the semiem- 
pirical methods AMI and MINDO. The 
MINDO charges derived by fitting the MEP 
can be linearly scaled to agree with results de- 
rived from ab initio calculations. Among the 
motivations for semiempirical methods are 
the facts that semiempirical methods using 
high quality basis sets often yield better re- 
sults than ab initio techniques employing min- 
imal basis sets, and the significant reduction 
in computational time in moving from ab ini- 
tio to semiempirical calculations. Rauhut and 
Clark (153) used the AMI wave function to 
develop a multicenter point-charge model in 
which each hybrid natural atomic orbital is 
represented by two charges located at the cen- 
troid of each lobe. Thus, up to nine charges (4 
orbitals and 1 core charge) are used to repre- 
sent heavy atoms. Results using this approach 
affirm the observations that distributed 
charges are more successful than atom-cen- 
tered charges in reproducing intermolecular 
interactions (154, 155). 

2.2.2 Parameter Derivation for Force Fieids. 

Because molecular mechanics is empirical, pa- 
rameters are derived by iterative evaluation of 
computational results, such as molecular ge- 
ometry (bond lengths, bond angles, dihedrals) 
and heats of formation, compared with exper- 
imental values (20). Eifson has coined the ex- 
pression "consistent" for force fields in which 
structures, energies of formation, and vibra- 
tional spectra have all been used in parame- 
terization by least-squares optimization. In 
the case of bond lengths, bond angles, and 
VDW parameters, crystallography has pro- 
vided most of the essential experimental data- 
base. Major efforts (156) to derive general sets 
of parameters from quantum mechanical cal- 
culation have been made, especially for sys- 
tems for which adequate experimental data 
are unavailable. Although quantum mechan- 
ics is certainly adequate for initial approxima- 
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tions of parameters and essential for charge 
approximations, a detailed analysis indicates 
that in vacuo calculations neglect many-body 
effects and can be misleading. A major effort 
hy Hehre (personal communication) to derive 
parameters for water from extensive ab initio 
calculations with large basis sets failed even to 
give a parameter set that reproduced the ra- 
dial distribution for bulk water. Parameters 
derived from relevant experimental data in 
condensed phase (especiallyif available in the 
solvent of theoretical interest) are generally 
more capable of accurately predicting results 
because the many-body effects are implicitly 
included in the parameterization. The basic 
assumption is that these "effective" two-body 
potentials implicitly incorporate many-body 
interaction energies. 

Jorgensen has parameterized by fitting 
properties of bulk liquids to Monte Carlo sim- 
ulations to give the AMBER/OPLS force field 
(26, 157, 158). Conceptually, one is attracted 
hy the use of liquids and their observable prop- 
erties as constraints during the derivation of a 
force field that is destined to study the proper- 
ties cf solvated molecules. 

2.2.3 Modeling Chemical Reactions and De- 
si^ of Transition-State Inhibitors. In cases, 
such as enzyme reactions, where chemical 
transformations occur, quantum chemical 
methods must be used to deal with electronic 
changes in hybridization and bond cleavage 
(159, 160). Hybrid applications (161-163) in 
which the reaction core is modeled quantum 
mechanically and the rest by molecular me- 
chanics would appear a viable option. Alterna- 
tively, the geometry of the transition state has 
been modeled by molecular mechanics, with 
force constants derived from ab initio calcula- 
tions that predict with amazing accuracy the 
relative selectivity of reactions. Andrews and 
coworkers (164) pioneered modeling of transi- 
tion states (165) of enzymatic reactions to de- 
sign transition- state inhibitors. 

3 KNOWN RECEPTORS 

A significant challenge is the design of novel 
hgands for therapeutic targets in which the 
three-dimensional structure has been deter- 



mined by either X-ray crystallography or 
NMR (12, 13, 166). The availability of the co- 
ordinates of all the atoms of the target sug- 
gests use of modeling of the site and interac- 
tion with prospective ligands. Qualitative 
information can be discerned by simple exam- 
ination of complexes by the use of molecular 
graphics and improvement of known ligands 
made by searching for accessory binding inter- 
actions through ligand modification. This ap- 
proach was pioneered by groups at Wellcome 
Research Laboratories (167-169)in designing 
analogs of 2,3-diphosphorylglycerate (Fig. 
3.13), to modulate oxygen binding to hemoglo- 
bin, and at Burroughs-Wellcome(170), to en- 
hance affinity of dihydrofolate reductase 
(DHFR) antagonists. When used in an itera- 
tive fashion, novel compounds with improved 
affinity result (166, 171, 172). Quantification 
of interactions and design of novel ligands re- 
quire application of molecular and statistical 
mechanics to quantify the enthalpy and en- 
tropy of binding. In other words, experimental 
measurements reflect free energies of binding 
and both enthalpic and entropic contributions 
must be estimated for prediction of affinities 
as part of the design process. When combined 
with combinatorial chemistry and high 
throughput screening, rapid identification of 
therapeutic candidates is feasible, as wit- 
nessed in the case of factor Xa antagonists 
(173) or TAR RNA inhibitors as possible HIV 
drugs (174). 

3.1 Definition of Site 

The availability of three-dimensional struc- 
tural information on a potential therapeutic 
target does not guarantee identification of the 
site of action of the substrate, or inhibitor, un- 
less the structure of a relevant complex has 
been determined. In fact, conformational 
changes often occur during binding of ligands 
to enzymes that are not r’eflected in the three- 
dimensional structure of the enzyme alone. Il- 
lustrative examples are the major conforma- 
tional changes seen (1 75,1 76)in HIV protease 
on binding the inhibitor MVT-101 (Fig. 3.14) 
and the changes in domain orientation ob- 
served (177) in the complex of an anti-HIV 
peptide antibody with the peptide. Until the 
two j3-strand flaps have been folded in, to com- 
plete the active site of HIV protease, many of 
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Figure 3.13. Diphosphoglycerate (a) and analogs (b-d) designed to optimize interactions bound in 
schematic model of hemoglobin. Used with permission (169). 



the important interactions for recognition in 
this proteolytic system have not been defined. 
In other cases of therapeutic targets, allosteric 
sites are involved in regulation of binding and 
cannot clearly be discerned from the crystal 
structure available. Here NMR offers a highly 
complementary approach where transfer and 
isotope-edited NOEs as weU as magic angle spin- 
ning NMR on solid samples can help identify 
those residues of the therapeutic target (Fig. 
3.15)involved in receptor interaction (178-1 80). 



One significant concern of structure-based 
design is the dynamics of the target itself. How 
stable is the active site to modifications in the 
ligand? Are there alternative potential bind- 
ing sites that could compete for the ligand? 
The geometrical identity of serine protease 
catalytic residues, for example, argues that 
the specificity essential for biological utility 
ensures a relatively rigid three-dimensional 
arrangement of functionality in the active site 
that determines molecular recognition and 
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Figure 3.14. Ribbon diagram of HIV - 1 protease in the absence of inhibitor (a) and when bound to the 
inhibitor MVT-101 (b). Diagrams based on crystal structures as reported by Miller et al. (175,176). 



icrimination. The active site has had no evo- 
lutionary pressure to optimize binding per se, 
but rather rates of interaction and discrimina- 
ti^^n among the limited repertoire of the bio- 
logical milieu. One classic example (181)of dif- 
ficulty in interpretation of binding as a result 
of ligand modification occurred when an ana- 
log designed to bind to a specific site on hemo- 
globin actually found a more appropriate site 
within the packed side chains of the protein 
molecule (Fig. 3.16).This example emphasizes 
the importance of protein dynamics. Alternate 
conformations of the protein that are easily 



accessible at room temperature may be diffi- 
cult to characterize experimentally because of 
relatively low abundance and/or lack of reso- 
lution of the experimental techniques used. 
Computationally, they are problematic as well 
because of the complexity of the energy sur- 
face for a macromolecule. 

3.2 Characterization of Site 

3.2.1 Voiume and Shape. Most substrate- 
enzyme or receptor-ligand interactions occur 
within pockets, or cavities, buried within pro- 
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Figure 3.15. Bound conformation of cyclosporin (a)as determined by NMR compared with solution 
conformation (b) (178). Residues involved with interaction with cyclophilin are indicated on (a) in 
bold. 



teins. Inside these invaginations, a microenvi- 
ronment is established that favors desolvation 
and binding of the ligand, despite the entropic 
cost of fixing the relative geometries of the two 
molecules. Knowledge of the three-dimen- 
sional structure of such cavities can assist the 
study of binding interactions and the design of 
novel ligands as potential therapeutics. Sev- 
eral algorithms to find, display, and character- 
ize cavity-like regions of proteins as potential 
binding sites have been developed. Kuntz et al. 
(13, 183) described a program, DOCK, to ex- 
plore the steric complementarity between li- 
gands and receptors of known three-dimen- 
sional structure. Using the molecular surface 
of a receptor, a volumetric representation of 
the chosen binding cavity is approximated by 
use of a set of spheres of various sizes that 
have been mathematically "packed" within it 
(Fig. 3.17). The set of distances between the 
centers of the spheres serves as a compact rep- 
resentation of the shape of the cavity. The use 



of the relative distance paradigm allows com- 
parison without the need for orientation of 
one shape with respect to the other. Potential 
ligands are characterized in a similar fashion 
by generating a set of spheres that mimic the 
shape of the ligand. Matching the distance ma- 
trix of the cavity with that of a potential ligand 
provides an efficient screen for selection of 
complementary shapes. Voorintholt et al. 
(184) used three-dimensional lattices to calcu- 
late density maps of proteins. In these maps, 
lattice points were assigned as a function of 
the distance to the nearest atom. This tech- 
nique is effective in delineating regions of low 
density where channels and cavities exist. Ho 
and Marshall (185) implemented a search 
function in CAVITY to allow the investigator 
to isolate a single cavity of interest by specify- 
ing a seed point. From this seed point, the al- 
gorithm systematically explored the entire 
volume of the cavity, following its borders and 
effectively filling every crevice within it; that 
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tions of parameters and essential for charge 
approximations, a detailed analysis indicates 
that in vacuo calculations neglect many-body 
effects and can be misleading. A major effort 
hy Hehre (personal communication) to derive 
parameters for water from extensive ab initio 
calculations with large basis sets failed even to 
give a parameter set that reproduced the ra- 
dial distribution for bulk water. Parameters 
derived from relevant experimental data in 
condensed phase (especiallyif available in the 
solvent of theoretical interest) are generally 
more capable of accurately predicting results 
because the many-body effects are implicitly 
included in the parameterization. The basic 
assumption is that these "effective" two-body 
potentials implicitly incorporate many-body 
interaction energies. 

Jorgensen has parameterized by fitting 
properties of bulk liquids to Monte Carlo sim- 
ulations to give the AMBER/OPLS force field 
(26, 157, 158). Conceptually, one is attracted 
hy the use of liquids and their observable prop- 
erties as constraints during the derivation of a 
force field that is destined to study the proper- 
ties cf solvated molecules. 

2.2.3 Modeling Chemical Reactions and De- 
si^ of Transition-State Inhibitors. In cases, 
such as enzyme reactions, where chemical 
transformations occur, quantum chemical 
methods must be used to deal with electronic 
changes in hybridization and bond cleavage 
(159, 160). Hybrid applications (161-163) in 
which the reaction core is modeled quantum 
mechanically and the rest by molecular me- 
chanics would appear a viable option. Alterna- 
tively, the geometry of the transition state has 
been modeled by molecular mechanics, with 
force constants derived from ab initio calcula- 
tions that predict with amazing accuracy the 
relative selectivity of reactions. Andrews and 
coworkers (164) pioneered modeling of transi- 
tion states (165) of enzymatic reactions to de- 
sign transition- state inhibitors. 

3 KNOWN RECEPTORS 

A significant challenge is the design of novel 
hgands for therapeutic targets in which the 
three-dimensional structure has been deter- 



mined by either X-ray crystallography or 
NMR (12, 13, 166). The availability of the co- 
ordinates of all the atoms of the target sug- 
gests use of modeling of the site and interac- 
tion with prospective ligands. Qualitative 
information can be discerned by simple exam- 
ination of complexes by the use of molecular 
graphics and improvement of known ligands 
made by searching for accessory binding inter- 
actions through ligand modification. This ap- 
proach was pioneered by groups at Wellcome 
Research Laboratories (167-169)in designing 
analogs of 2,3-diphosphorylglycerate (Fig. 
3.13), to modulate oxygen binding to hemoglo- 
bin, and at Burroughs-Wellcome(170), to en- 
hance affinity of dihydrofolate reductase 
(DHFR) antagonists. When used in an itera- 
tive fashion, novel compounds with improved 
affinity result (166, 171, 172). Quantification 
of interactions and design of novel ligands re- 
quire application of molecular and statistical 
mechanics to quantify the enthalpy and en- 
tropy of binding. In other words, experimental 
measurements reflect free energies of binding 
and both enthalpic and entropic contributions 
must be estimated for prediction of affinities 
as part of the design process. When combined 
with combinatorial chemistry and high 
throughput screening, rapid identification of 
therapeutic candidates is feasible, as wit- 
nessed in the case of factor Xa antagonists 
(173) or TAR RNA inhibitors as possible HIV 
drugs (174). 

3.1 Definition of Site 

The availability of three-dimensional struc- 
tural information on a potential therapeutic 
target does not guarantee identification of the 
site of action of the substrate, or inhibitor, un- 
less the structure of a relevant complex has 
been determined. In fact, conformational 
changes often occur during binding of ligands 
to enzymes that are not r’eflected in the three- 
dimensional structure of the enzyme alone. Il- 
lustrative examples are the major conforma- 
tional changes seen (1 75,1 76)in HIV protease 
on binding the inhibitor MVT-101 (Fig. 3.14) 
and the changes in domain orientation ob- 
served (177) in the complex of an anti-HIV 
peptide antibody with the peptide. Until the 
two j3-strand flaps have been folded in, to com- 
plete the active site of HIV protease, many of 




102 



Molecular Modeling in Drug Design 



overlap (even if the electronegativities of the 
two atoms are quite dissimilar). The sum is 
the gross atomic population, and the net 
atomic charge is simply this plus the nuclear 
charge. The result is very sensitive to the basis 
set (the number of atomic orbitals) used. De- 
spite poor fit of the molecular electrostatic po- 
tential derived with point charges to the ab 
initio electrostatic potential, or that derived 
from a distributed multipole analysis (150), 
widespread use continues because they do re- 
flect chemical trends and are reportedly com- 
patible with known electronegativities. In ad- 
dition, this option is commonly available in 
software packages. Unfortunately, poor repre- 
sentation of the electric field surrounding the 
molecule results from use of atom-centered 
monopole models (42), even when more care- 
ful methods are used to distribute the charge. 

2.2,1, 2 Methods to Reproduce the Molecu- 
lar Electrostatic Potential (MEP). The electro- 
static potential surrounding the molecule that 
is created by the nuclear and electronic charge 
distribution of the molecule is a dominant fea- 
ture in molecular recognition. Williams re- 
views (42) methods to calculate charge models 
to accurately represent the MEP as calculated 
by ab initio methods by use of large basis sets. 
The choice between models (monopole, dipole, 
quadrapole, bond dipole, etc.. Fig. 3.12) de- 
pends on the accuracy with which one desires 
to reproduce the MEP. This desire has to be 
balanced by the increased complexity of the 
model and its resulting computational costs 
when implemented in molecular mechanics. 

The first problem is to select points where 
the MEP is to be evaluated and eventually fit- 
ted, the position of the shell outside the VDW 
radii of the atoms in the molecule, and the 
spacing of grid points on that shell. Sampling 
too close to the nuclei gives rise to anomalies 
because the potential around nuclei is always 
positive. Singh and Kollman (151) report the 
use of four surfaces at 1.4, 1.6, 1.8, and 2.0 
times the VDW radii, with a density of one to 
five points per A^. This paradigm was reported 
to give an adequate sampling to which the fit- 
ted charges were fairly insensitive, at least at 
the higher values. An improved procedure, the 
restrained electrostatic potential fit (RESP), 
was developed by Bayly et al. (41) to enhance 
transferability of the resulting point charges. 



Williams (42) derived a procedure to derive 
the best fit to a given MEP with a defined set of 
monopoles, dipoles, and so forth. 

Typically, fragments of molecules of inter- 
est are analyzed by ab initio techniques to gen- 
erate their MEPs that are the reference for 
parameterization of charge. Besler et al. (152) 
reported fitting of atomic charges to the elec- 
trostatic potentials calculated by the semiem- 
pirical methods AMI and MINDO. The 
MINDO charges derived by fitting the MEP 
can be linearly scaled to agree with results de- 
rived from ab initio calculations. Among the 
motivations for semiempirical methods are 
the facts that semiempirical methods using 
high quality basis sets often yield better re- 
sults than ab initio techniques employing min- 
imal basis sets, and the significant reduction 
in computational time in moving from ab ini- 
tio to semiempirical calculations. Rauhut and 
Clark (153) used the AMI wave function to 
develop a multicenter point-charge model in 
which each hybrid natural atomic orbital is 
represented by two charges located at the cen- 
troid of each lobe. Thus, up to nine charges (4 
orbitals and 1 core charge) are used to repre- 
sent heavy atoms. Results using this approach 
affirm the observations that distributed 
charges are more successful than atom-cen- 
tered charges in reproducing intermolecular 
interactions (154, 155). 

2.2.2 Parameter Derivation for Force Fieids. 

Because molecular mechanics is empirical, pa- 
rameters are derived by iterative evaluation of 
computational results, such as molecular ge- 
ometry (bond lengths, bond angles, dihedrals) 
and heats of formation, compared with exper- 
imental values (20). Eifson has coined the ex- 
pression "consistent" for force fields in which 
structures, energies of formation, and vibra- 
tional spectra have all been used in parame- 
terization by least-squares optimization. In 
the case of bond lengths, bond angles, and 
VDW parameters, crystallography has pro- 
vided most of the essential experimental data- 
base. Major efforts (156) to derive general sets 
of parameters from quantum mechanical cal- 
culation have been made, especially for sys- 
tems for which adequate experimental data 
are unavailable. Although quantum mechan- 
ics is certainly adequate for initial approxima- 
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and receptor (185). At every cavity-pocket in- 
terface point, the electrostatic potential of 
both the atoms forming the cavity and those of 
the binding ligand are calculated. A rough ap- 
proximation of complementarity is computed 
by multiplying these potentials together. Afa- 
vorable electrostatic interaction is produced 
when the electrostatic potentials are opposite 
in sign. Therefore, favorable interactions are 
indicated when the product of these values is a 
negative number. Likewise, unfavorable in- 
teractions are indicated when the product of 
these values is a positive number and the po- 
tential of the cavity and that of the binding 
ligand have the same sign. These products are 
then normalized, assigned a color, and dis- 
played. 

In a similar way, an estimate of the hydro- 
phobic character of a segment of the surface 
can be quantitated and indicated through 
color coding. The ability to rapidly switch be- 
tween these hydrophobic and electrostatic 
surface representations, to visually integrate 
the optimal complementarity between site 
and potential ligand to be designed, is helpful. 

3.3 Design of Ligands 

3.3.1 Visually Assisted Design. In the pro- 
cess of optimization of a lead, one needs to 
ascertain where modification is feasible. Al- 
though visualization of the excess space avail- 
able in the active-site cavity by directly exam- 
ining ligands is useful for locating selected 
regions where ligand modifications may be 
made, it is not well suited for fully character- 
izing the void that exists between the ligand 
and the receptor, the ligand-receptor gap re- 
gion; information concerning the relative di- 
mensions of free space is difficult to discern. 
To facilitate the display of this information. 
Ho and Marshall (185) developed another al- 
gorithm to color-code the cavity display by the 
ligand-receptor nearest atom gap distance. 
The actual VDW, surface-to-surface distance 
(not center to center) between the ligand and 
enzyme atoms is calculated. When the ligand- 
receptor distances have been calculated at all 
cavity-pocket interface lattice points, a user- 
defined color-coding scale is implemented to 
generate the displays. This highlights those 



areas that are less well packed and available 
for ligand modification. 

3.3.2 Three-Dimensional Databases. Medici- 
nal chemists have recognized the potential of 
searching three-dimensional chemical data- 
bases to aid in the process of designing drugs 
for known, or hypothetical, receptor sites. Sev- 
eral databases are weU known, such as the 
Cambridge Crystallographic Database (194) 
(CSD). The crystal coordinates of proteins and 
other large macromolecules are deposited into 
the Brookhaven Protein Databank (195). The 
conformations present in crystallographic da- 
tabases reflect low energy conformers that 
should be readily attainable in solution and in 
the receptor complex. The three-dimensional 
orientation of the key regions of the drug that 
are crucial for molecular recognition and bind- 
ing are termed thepharmacophore. The inves- 
tigator searches the three-dimensional data- 
base through a query for fragments that 
contain the pharmacophoric functional 
groups in the proper three-dimensional orien- 
tation. Using these fragments as "building 
blocks," completely novel structures may be 
constructed through assembly and pruning 
(196). Receptor sites are complex both in geo- 
metrical features and in their potential energy 
fields, and many diverse compounds can bind 
to the same protein by occupying various com- 
binations of subsites. Noncrystallographicda- 
tabases have been developed as well. One ex- 
ample is the three-dimensional database of 
structures from Chemical Abstracts gener- 
ated through CONCORD (197-199) that con- 
tains over 700,000 entries. The use of such 
databases is most applicable when the binding 
of a particular ligand and its receptor is well 
understood in terms of functional group rec- 
ognition, and a crystal structure of the com- 
plex is known (200). One approach to ligand 
design is to develop novel chemical architec- 
tures (i.e., scaffolds) that position the pharma- 
cophoric groups, or their bioisosteres, in the 
correct three-dimensional arrangement. 

Gund conceived the first prototypic pro- 
gram designed to search for molecules that 
match three-dimensional pharmacophoric 
patterns (201, 202). This program, MOLPAT, 
performed atom-by-atom searches to verify 
comparable interatomic distances between 
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pattern and candidate structures. Although 
rigorous, this approach was tedious and re- 
quired optimization. Lesk (203) devised a 
method that used the geometric attributes of 
the query to screen potential candidates. Sim- 
ilarly, Jakes and Willett (204) proposed that 
screens based on interatomic distances and 
atom types could considerably augment 
search efficiency. Furthermore, Jakes et al. 
(205) showed that methods widely used in 
two-dimensional structure retrieval could be 
applied to three-dimensional searches, to re- 
move the vast majority of compounds before 
moie rigorous comparisons. This was vali- 
dated in test searches against a subset of the 
CSD. This concept was furthered by Sheridan 
et al. (200), who included screens based on aro- 
maticity, hybridization, connectivity, charge, 
position of lone pairs, and centers of mass of 
rings. To contain this wealth of information, 
an inverted bit map [the presence or absence 
of a feature is encoded as a 1 or 0 (bit) at a 
particular location in a "keyword"] was em- 
ployed for highly efficient screening, hundreds 
of thousands of compounds in minutes. 

Similar database searching methods have 
been incorporated into a number of current 
database searching systems. Programs such as 
CAVEAT (206), ALADDIN (Abbott) (207), 
3DSEARCH (Lederle) (208), MACCS-3D 
(209), CHEM-X (210), UNITY (211), and oth- 
ers contain considerable functionality useful 
for such an approach. CAVEAT (206) is de- 
signed to assist a chemist in identifying cyclic 
structures that could serve as the foundation 
for novel compounds. In particular, it allows 
an investigator to rapidly search structural 
databases for compounds containing substitu- 
ent bonds that satisfy a specific geometric re- 
lationship. AL ADDIN (207), 3DSEARCH 
(208), MACCS-3D (209), and CHEM-X (210) 
are similar, in that geometric relationships be- 
tween various user-defined atomic compo- 
nents can be used as a query to retrieve match- 
ing structures. Eeatures have been included to 
allow the user to delineate molecular charac- 
teristics (atom type, bond angles, torsional 
constraints, etc.) to ensure the retrieval of rel- 
evant compounds. Additional constraints have 
been incorporated into 3DSEARCH (208) and 
ALADDIN (207), including the consideration 
of retrieved ligand-receptor volume comple- 



mentarity. Eurthermore, CHEM-X (210) per- 
forms a rule-based conformational search on 
each structure in the database to account for 
conformational flexibility. Eor a comprehen- 
sive review of three-dimensional chemical da- 
tabase searching, see Martin et al. (212,213). 

Pharmaceutical companies have developed 
three-dimensional databases for their com- 
pound files to help prioritize candidates for 
screening (210, 214). An essential component 
in such a system is a method for assessing sim- 
ilarity (2 12,2 15). Because most compound da- 
tabases were entered as two-dimensional 
structures, this has required conversion to a 
three-dimensional format. Programs have 
proved (197-199, 216) useful in generating 
plausible three-dimensional structures from 
the connectivity data, as reviewed by Sa- 
dowski and Gasteiger (217). Because of the in- 
herent flexibility in most compounds, the use 
of a single conformation to represent the 
three-dimensional potential for interaction of 
a molecule is a clear limitation. Develoj)ment 
of three-dimensional databases with a com- 
pact, coded representation of the conforma- 
tional states available to each compound is a 
logical next step. Efficient use of such a data- 
base requires methods for evaluating three- 
dimensional similarities. In addition to identi- 
fication of compounds that can present an. 
appropriate three-dimensional pattern, com- 
pounds must also fit within the receptor cav- 
ity. Based on a shape-matching algorithm, 
Sheridan et al. (200) screened candidate com- 
pounds to select those whose volumes would 
fit within the combined volumes of known ac- 
tive compounds. Previously, this group used 
(218) the same algorithm to help identify po- 
tential ligands for papain and carbonic anhy- 
drase, by screening compounds from the CSD. 
Screening of the active site of HIV protease 
identified (219) haloperidol (Pig. 3.20) as an 
inhibitor of the enzyme and provided a novel 
chemical lead for further investigation. Burt 
and Richards (220) introduced flexible fitting 
of molecules to a target structure, with assess- 
ment of molecular similarity as a means of 
dealing with the conformational problem. 

The use of preliminary screens can elimi- 
nate the vast majority of compounds before 
more rigorous, and computationally demand- 
ing, pattern-matchingcomparisons (2 12,2 1 3) . 
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Figure 3.20. Structure of bromperidol (top) found 
by DOCK program when used on active site of HIV - 1 
protease (219) compared with structure of JG-365 
(bottom), a typical substrate-derived inhibitor. 

This search strategy is indeed very quick and 
efficient; however, all retrieved compounds 
must contain every query component as de- 
fined in the preliminary screens. As the num- 
ber and complexity of the query elements in- 
crease, one would anticipate fewer true hits, 
but a corresponding rise in the number of 
near-misses. If such near-misses could be re- 
covered, effective ligands may simply arise 
from slight conformational modification to 
maximize receptor interactions. Furthermore, 
the retrieval and combinatorial assembly of 
numerous pharmacophore subcomponents 
would intuitively produce many more diverse 
structures than the quest for a single com- 
pound in the database incorporating the en- 
tire pharmacophore, that is, all requirements 
of the query. This suggests an approach that 
would retrieve compounds containing any 
combination of a minimum number of match- 
ing pharmacophoric elements. 

Methods have been developed that employ 
this "divide-and-conquer" approach to ligand 
development. The active site is partitioned 
into subsites, each containing several pharma- 
cophoric elements. Chemical fragments com- 
plementary to each subsite are then designed 
or retrieved from databases. Finally, frag- 
ments are linked to form aggregate ligands. 
The advantage of this approach is that ligand 
diversity can be tremendously augmented 
through the combinatorial assembly of nu- 
merous subcomponents. DeJarlais was per- 



haps the first to employ this philosophy in a 
novel application (220) of the program DOCK. 
This well-known program searches three-di- 
mensional databases of ligands and deter- 
mines potential binding modes of any that will 
fit within a target receptor (183). However, 
only a single, static conformation of each da- 
tabase structure is maintained, disregarding 
ligand flexibility. In DeJarlais’ method, con- 
formational flexibility was later introduced by 
dividing individual ligands into fragments 
overlapping at rotatable bonds. Each frag- 
ment was first docked separately into various 
receptor regions. Attempts were then made to 
reassemble the component parts into a legiti- 
mate structure. A current example of this ap- 
proach is the program LUDI, written by Bohm 
(22 1 ,222). In this program, a receptor volume 
of interest is scanned to determine subsites 
where hydrogen bonding or hydrophobic con- 
tact can occur. Small complementary mole- 
cules are then chosen from a database and po- 
sitioned within these subsites to optimize 
binding energy. The process concludes with 
the selection of various bridging fragments to 
link subsets of small molecules. 

Chau and Dean published a series of arti- 
cles addressing whether small molecular frag- 
ments, with transferable properties, could be 
generated for further use in automated site- 
directed drug design (223-225). A program 
was developed to combinatorially generate all 
three-, four-, and five-atom fragments con- 
taining any geometrically allowed combina- 
tion of H, C, N, O, F, and Cl. Aromatic frag- 
ments were produced as well. Searches of the 
Cambridge Structural Database (194) were 
performed to determine the most frequently 
occurring fragments. To utilize these frag- 
ments as components for ligand assembly, 
more data were necessary to better character- 
ize them. They were analyzed, therefore, to 
statistically ascertain bond lengths from the 
CSD to provide some geometrical constraints 
for structure assembly. Finally, the transfer- 
ability of atomic residual charges was studied 
by comparing charges generated for the atoms 
in each fragment with charges calculated for 
whole molecules containing the fragment. 

Another approach, FOUNDATION (226), 
searches three-dimensional databases of 
chemical structures for a user-defined query 
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consisting of the coordinates of atoms and/or 
bonds. All possible structures that contain any 
combination of a user-specified minimum 
number of matching atoms and/or bonds are 
retrieved. Combinations of hits can be gener- 
ated automatically by a companion program 
(104)j SPLICE, which trims molecules found 
from the database to fit within the active site 
and then logically combines them by overlap- 
ping bonds to maximize their interactions 
with the site (Fig. 3.21). The addition of bridg- 
ing fragments to those recovered from the da- 
tabase allows generation of many novel li- 
gands for further evaluation. 

3.3.3 De Novo Design. Design of novel 
chemical structures that are capable of inter- 
acting with a receptor of known structure uses 
methodology that is much more robust, given 
that the geometric foundations of molecular 
sciences are much firmer than the thermody- 
namic ones. Techniques for the design of novel 
structures to interact with a known receptor 
site are becoming more available and show 
promise (227-229). It has become quite evi- 
dent that much of a molecule acts simply as a 
scaffold to align the appropriate groups in the 
three-dimensional arrangement that is crucial 
for molecular recognition. By understanding 
the pattern for a particular receptor, one can 
transcend a given chemical series by replacing 
one scaffold with another of geometric equiv- 
alence. This offers a logical way to dramati- 
cally change the side-effect profile of the drug 
as well as its physical and metabolic at- 
tributes. Various software tools are already 
under development to assist the chemist in 
this design objective. Lewis and Dean de- 
scribed their approaches to molecular tem- 
plates in a series of papers (230, 231). An al- 
ternative approach, BRIDGE (Dammkoehler 
et al., unpublished), is based on geometric gen- 
eration of possible cyclic compounds as scaf- 
folds, given constraints derived from the types 
of chemistry the chemist is willing to consider. 
Nishibata and Itai (232, 233) published a 
Monte Carlo approach to generating novel 
structures that fit a receptor cavity. Pearlman 
andMurko (234)combined a similar approach 
with molecular dynamics with illustrative ap- 
plications to HIV protease and FK506 binding 
protein. CAVEAT is a program developed by 



Bartlett to find cyclic scaffolds (207)by search- 
ing the CSD (195) for the correct vectorial ar- 
rangement of appended groups. 

All of these approaches attempt to help the 
chemist discover novel compounds that will be 
recognized at a given receptor. Van Drie et al. 
(207) described a program, ALADDIN, for the 
design or recognition of compounds that meet 
geometric, steric, or substructural criteria, 
and Bures et al. (235) described its successful 
application to the discovery of novel auxin 
transport inhibitors. As our knowledge base of 
receptors grows, such tools will prove increas- 
ingly useful. The ability to transcend the 
chemical structure of lead compounds, while 
retaining the desired activity, should dramat- 
ically improve the ability to design away unde- 
sirable side effects. Bohm developed the pro- 
gram LUDI (22 1,222) to construct ligands for 
active sites with an empirical scoring function 
to evaluate their construction. 

3.3.4 Docking. The search for the global 
minimum, or the complete set of low energy 
minima, on the free energy surface when two 
molecules come in contact is commonly re- 
ferred to as the "docking" problem [(236); see 
also Leach (21)]. Any useful molecular docking 
program must be computationally efficient in 
determining the most favorable binding mode, 
sufficiently sensitive in its scoring function to 
discriminate between alternate binding 
modes and the correct mode, and robust 
enough to allow various ligand-receptor sys- 
tems to be studied. 

3.3.4. 1 Docking Methods. In the case of 
two proteins of known structure that can be 
approximated as rigid bodies, there are 6 de- 
grees of freedom, the relative position (:k, y, 
and z coordinates), and relative orientation 
(roll, pitch, and yaw to use the aeronautical 
expressions) to be explored. Several very intel- 
ligent approaches to this problem have been 
developed. The first and most well known ap- 
proach is the DOCK program (http://www. 
cmpharm.ucsf.edu/kuntz/dock.html) (183) that 
was developed to solve the ligand-receptor 
problem. This program uses abstract repre- 
sentations (a set of spheres) of the convex 
shape on the receptor to be filled and the con- 
cave ligand and matches them to generate 
plausible binding modes with complementary 
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Figure 3.21. Combination by SPLICE (104) of fragments that bind to different subsites of NADP 
binding site of DHFR to generate a more optimal ligand. 
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surfaces. An example of the successful use of 
DOQC was the identification of 1 3 inhibitors 
of DHFR from P. carinii selected from the 
Fine Chemicals Directory. Of 40 compounds 
predicted to be active, these 1 3 showed IC„ 
values less than 150 micromolar. DOCK (13, 
183) has been quite successful in finding non- 
congeneric molecules of the correct shape to 
interact with a receptor cavity (237-239). An 
overview of docking and scoring functions is 
available (240). 

Another approach focusing on complemen- 
tary surface maximization uses a grid represen- 
tation cf the surface in a series of slices. The 
slices from the target molecule are processed 
against the slices from the other molecules by 
use cf a variant of the fast-Fourier transform 
(241-244) to identify those sections with the 
greatest complementarity. This approach has 
been incorporated and extended to electrostatic 
complementarity in FTDOCK (http://www. 
bmm.icnet.uk/ftdock/ftdock.html) by Gabb 
et al. (245). This approach is a relatively fast 
method for searching the 6 degrees of freedom 
and has reproduced the binding mode of sev- 
eral macromolecular complexes and is avail- 
able in GRAMM (Global Range Molecular 
Matching , http : //re c o 3 . mu s c . edu/gr amm/) 
that was judged the best when applied to iden- 
tify the binding modes in a set of macromolec- 
ular complexes at the second (Fall, 1996) 
GASP evaluation of prediction methods. 

Obviously, other degrees of freedom should 
be included to allow both molecules to undergo 
conformational changes (side-chain relax- 
ation, at the very least, in the case of proteins) . 
In many cases, the active site of the receptor is 
assumed to be rigid (rationalized on the basis 
of the specificity and affinity of the system) 
and a flexible ligand is docked. This limits the 
number of degrees of freedom to be explored. 

simply generating a set of low energy con- 
formers of the ligand and processing them se- 
quentially with DOCK (220), one can sample 
cn a low resolution scale; the flexible ligand 
problem can be addressed on the basis of shape 
complementarity. 

FlexX is a program for flexibly docking li- 
gands into binding sites, by use of an incre- 
mental construction algorithm that builds the 
ligands in the binding site (246). It starts by 
extracting a core fragment from the ligand. 



The algorithm is dependent on selection of an 
appropriate base fragment, requiring one that 
makes enough specific contacts with the pro- 
tein that a definite preference for binding ori- 
entation can be determined. FlexX holds bond 
lengths and angles invariant, using the values 
of the input ligand. The core is used as the base 
to which low energy fragment conformers are 
added, with these conformers based on a sta- 
tistical evaluation of fragments in the Cam- 
bridge Structural Database. 

3. 3 .4.2 Scoring Functions (247-260). 
Three-dimensional qualitative structure-ac- 
tivity relationship (3D QSAR) approaches 
based on the use of training sets of structures 
with measured affinities are often used to gen- 
erate a model with predictive powers. The lim- 
itation in such methodologies is the necessity 
for a robust training set of diverse chemical 
structures to encompass the domain of possi- 
ble interactions with the therapeutic target. 
At the beginning of a project, or when three- 
dimensional information on a novel target 
first becomes available, such data on a diverse 
set of chemical ligands are usually not avail- 
able. For this reason, one would like to capital- 
ize as much as possible on the physical chem- 
istry of the possible interactions between the 
ligand and its receptor when the structure of 
the receptor is available. Because of the need . 
to prioritize synthesis in structure-based de- 
sign efforts and prioritize compounds in com- 
binatorial libraries for screening as well as 
predict the structure of protein complexes, an 
increased interest in scoring functions (i.e., 
empirical approaches to predict affinities) 
have emerged. Several early attempts and 
their reported predictive ability are cited next. 

1. Bohm (221, 222) analyzed 45 protein-li- 
gand complexes (affinity range = -9 to 
- 76 kJ/mol) and found the following equa- 
tion by multiple regression analysis: 

A (kj/mol) 5.4AG„ - 4.7AGh, 

- 8.3AGi,„i, - O.lTAGiip, + 1.4AG„, 

r- = 0.76, S = 7.9, q- = 0.696, 

S (press) = 9.3 (2.2 kcal/mol) 
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2. Krystek et al. (261) analyzed 19 protein- 
ligand complexes in an update of the No- 
votny approach (262). 

^ (^binding (kcal/mol) H O.O 25 ACt 03 ^ 



= 0.69, S = 4.0 

3. VALIDATE is a hybrid approach to predict 
the binding affinity of novel ligands for 
a receptor of known three-dimensional 
structure based on the calculation of sev- 
eral physicochemical properties of the li- 
gand itself as well as a molecular mechanics 
analysis of the receptor-ligand complex 
(263). The properties of a diverse training 
set (-log range = 2.47-14.00) of 51 
crystalline complexes were analyzed by 
partial least squares (PLS) statistical 
methodology and neural network analysis 
to select a statistical model from a variety 
of parameters with the following proper- 
ties: 

= 0.81, 5 = 1.15,^2 = 0.72, 

S (press) = 1.29 (1.75kcal/mol) 

The true measure of any model rests in its 
ability to predict the affinity of new com- 
pounds. This would include the prediction of 
unique ligands bound to receptors that exist in 
the base set as well as the affinities of unique 
ligand/receptor complexes. Three separate 
test sets were compiled for this purpose. The 
first set consisted of 14 inhibitors that were 
obtained from crystalline receptor/ligand com- 
plexes. Neither ligands nor their receptor 
classes were included in this training set. 
Included were 2 DHFR, 2 penicillipepsin, 3 
carboxypeptidase, 2 alpha- thrombin, and 2 
trypsinogen inhibitors as well as 3 DNA-bind- 
ing molecules. Prediction of binding affinities 
gave a PLS predictive = 0.786, with an ab- 
solute average error of 0.693 log units. The 
second test set consisted of 13 HIV protease 
inhibitors whose initial conformation and 
alignment were derived from the CoMFA 
analysis done by Waller et al. (264). The selec- 
tion of the inhibitors was based on maintain- 



ing a good range of activity as well as using 
several inhibitors from the published test set. 
The PLS predictive value was 0.565, with an 
absolute average error of 0.694. The predictive 

value is considerably lower than that of the 
first test set, although this is attributed to the 
smaller range and distribution of activity in 
this set. The absolute average error is almost 
identical. 

Although shape complementarity is an im- 
portant consideration and shows correlation 
with the energy of interaction, it does not con- 
sider the electrostatics of the system (the rel- 
ative positioning of hydrogen-bond donors and 
acceptors, etc.). More sophisticated energetic 
functions are often used to refine the candi- 
date binding modes found by DOCK, or in the 
docking process itself. The assumption of rigid 
geometry for the receptor allows a preprocess- 
ing of the energetic contribution of the recep- 
tor to each grid point of a lattice constructed 
within the active site cavity (131, 265, 266). 
This allows a simple estimation of the energy 
of interaction of each atom in the ligand by 
finding the energy of the lattice points that are 
closest followed by interpolation. By increas- 
ing the efficiency of the scoring function, more 
candidate binding modes can be evaluated 
and, thus, one resembling the global minimum 
is more likely to be found. This assumes that, 
the scoring function used is sufficiently accu- 
rate to discriminate between the correct bind- 
ing mode and others, and the problem is sim- 
ply one of sampling. Most scoring functions 
used, however, deal almost essentially with 
the enthalpy of binding and ignore the entropy 
of binding. It should not be surprising, there- 
fore, that the agreement between the pre- 
dicted binding modes and those observed ex- 
perimentally are not always perfect. As one is 
attempting to discriminate between alternate 
binding modes of the same complex, difficul- 
ties in estimating entropy and desolvation are 
minimal because many of the terms (solvation 
and entropy of isolated ligand and receptor) in 
the comparison cancel. 

3.3.43 Search for the Correct Binding 
Mode (267—283) . Just as there are many dif- 
ferent approaches to the global minimization 
problem, most, if not all, have been applied to 
the docking problem. These include molecular 
dynamics, Monte Carlo sampling, systematic 
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search (284), the genetic algorithm (101, 102, 
105, 285, 286), and straight derivative optimi- 
zation with multiple starting geometries. A 
combination of MD/MC has been shown (287, 
288) to be a fairly efficient method for deter- 
rniningthe free energy surface in smaller host- 
guest systems (289). The combination of mo- 
lecular dynamics to locally sample with Monte 
Carlo that allows for conformational transi- 
tions provides adequate sampling if sufficient 
computational resources are available. 

Wasserman and Hodge (290) used molecu- 
lar dynamics to dock thermoly sin inhibitors to 
an approximate model of the enzyme, with 
flexibihtyin the active site (3 8 of 314 residues) 
and ligand and with the rest of the enzyme 
represented by a grid approximation. A solva- 
tion model was used to compensate for desol- 
vation in complex formation. To get 22 of 25 
mns to orient the hydroxamate function cor- 
rectly, the hydroxamate oxygens of the start- 
ing conformation were initialized within 4 A of 
the zinc. If they were allowed to vary to 8 A, then 
only 3 of 24 runs placed the ligand correctly. 
Obviously, there is a serious sampling problem. 

Desmet et al. (291) used a truncated (dead- 
end elimination) search procedure to bind 
flexible peptides to the MHC I receptor. The 
translation/rotational space covered 6636 rel- 
ative orientations and each nonglycine/proline 
residue of the peptide had 47 main-chain con- 
formers. Side chains had threefold rotations 
about their chi angles and 28 side chains of the 
receptor were allowed to rotate. Seventy-four 
low energy structures were obtained with an 
average rmsd of 1 A. The lowest energy struc- 
ture had an rmsd of 0.56 A. Peptides up to 20 
residues were docked with this procedure. 

King et al. (292) used an empirical binding 
free-energy function when docking MVT-101 
to HIV protease. Forty-nine translation/rota- 
tions were examined with the Ponder/Richard 
rotamer library. Only a limited number of 
rotamers for each amino acid were examined: 
Thr(2), Ile(3), Nle(3), Nle(3), Gln(6), and 
Arg(5). According to the authors, 2.24 x 10^® 
discrete states were examined. Sixty-four low 
ener^ structures with an average rmsd of 
1.36 A were found. If the CHARMM potential 
was used with the same protocol, then the av- 
erage rmsd was increased to 1.68 A. 

The genetic algorithm has been used by 



several groups (101,102,105,285,286,293) to 
optimize the scoring function used. Encoding 
of the conformation of the ligand by torsional 
degrees of freedom and generating increas- 
ingly more fit sets of progeny by mutation and 
crossover have proved to be an effective search 
strategy. In one example (285), a Gray-coded 
binary string was used for the three transla- 
tions, three rotations, and bond rotations that 
specified the binding mode, and a two-point 
crossover operator was used in the GA algo- 
rithm. In the four examples of complexes with 
known crystal structures, the results of rigid- 
body docking with a straightforward applica- 
tion of the GA were not encouraging, in that 
the correct binding mode was identified in 
only two of the four test cases. Restraining the 
GA to search subdomains (different binding 
hypotheses) in a systematic manner corrected 
this problem. Only the ligand was allowed 
flexibility and the GA procedure was repeated. 
Several binding modes similar to that seen in 
the experimental complex were found in each 
example, but ones with the lowest energy did 
not necessarily have the lowest rms from the 
experimental, pointing out deficiencies in the 
AMBER-like scoring function used. 

Generally, no single scoring function can 
accurately predict the binding affinities for all 
types of ligands with all types of receptors. 
Consensus scoring (294, 295) is the simulta- 
neous use of multiple different scoring func- 
tions to make virtual screening more predic- 
tive. CScore (Tripos, Inc.) is a consensus- 
scoring program that integrates several well- 
known scoring functions from the scientific 
literature. Each individual scoring function is 
used to predict the affinity of ligands in candi- 
date complexes. CScore also creates a consen- 
sus column, containing integers that range 
from 0 to the total number of scoring func- 
tions. Each complex whose score exceeds the 
threshold for a particular function adds 1 to 
the value of the consensus; configurations be- 
low the threshold contribute a zero. Consen- 
sus columns can also be calculated from any 
combination of externally supplied indicators, 
so that key aspects of binding (e.g., the pres- 
ence of a specific hydrogen bond) can be used 
to discriminate good configurations from bad 
ones. CScore can be used to rank multiple con- 
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figurations of the same ligand docked with a 
receptor, or to rank selected configurations of 
different ligands docked to the same receptor. 

These approaches implicitly assume that 
the observed receptor cavity has some physical 
stability (i.e., a static view) and is not signifi- 
cantly altered by binding of different ligands. 
Although there is no guarantee that this is 
true for any particular case under study, the 
specificity seen in biological systems argues 
that a receptor site has some functional signif- 
icance in imposing its specific steric and elec- 
trostatic characteristics in the molecular rec- 
ognition and selection process. One must 
always be prepared, however, for binding to 
sites other than that targeted, and possible 
exposure of cryptic sites that are not observed 
in the absence of the ligand (181). The current 
computational limits in molecular dynamics 
simulations restrict the chance of uncovering 
such alternative binding modes in our studies. 
If we can assume the binding mode of our can- 
didate drug is nearly identical to that of a 
known compound, however, then we have a 
legitimate basis for thermodynamic perturba- 
tion calculations. Multiple or alternate bind- 
ing modes remain a major fly in the ointment. 
Naruto et al. (284) demonstrated a systematic 
approach to the determination of productive 
binding modes for mechanism-based inhibi- 
tors (Fig. 3.22) that could select starting struc- 
tures for complexes for molecular dynamics 
simulations. Combinations of methods, such 
as Monte Carlo or systematic search, to gener- 
ate multiple starting configurations for simu- 
lations to improve sampling and thermody- 
namic reliability will increase as adequate 
computational power to support these hybrid 
approaches becomes more readily available. 

Many technical limitations remain to be 
overcome before ligand design becomes reli- 
able and routine. Many deficiencies in molec- 
ular mechanics previously cited remain that 
limit reliability. Adequate modeling of electro- 
statics remains elusive in many experimental 
systems of interest such as membranes. 
Newer derivations of force fields, such as MM3 
(27, 296 and references therein), CHARMM 
(297, 298), AMBER/OPLS (157), ECEPP 
(299), and others (156, 300), are attempting to 
more accurately represent the experimental 
data, whereas others include a broader spec- 




Figure 3.22. Use cf systematic search to explore 
possible binding modes of mechanism-based inhibi- 
tors cf chjrmotrypsin (284) by rotation of six bonds 
(*), which orient carbonyl of substrate relative to 
hydroxyl (Du) of Ser-195. 

trum of chemistry such as metals (29-3 1 ,30 1 - 
305). Combinations of molecular mechanics 
with quantum chemistry (159, 160, 162, 306) 
are clearly necessary for problems in which 
chemical transformations are involved. 
Rather amazing agreement between calcula- 
tion and experiment has been reported (1^5, 
307) on the relative stabilities of transition- 
state structures, although there is some con- 
troversy (308) regarding this approach. In any 
case, this is another area of rapid growth as 
adequate computational resources become 
available. Riley et al. (309, 310) found an ex- 
cellent correlation between the relative stabil- 
ities of conformersin manganese complexes of 
pentaazacrowns and their ability to catalyze 
the dismutation of superoxide. 

3.4 Calculation of Affinity (260) 

3.4.1 Components of Binding Affinity 
(255). The ability to calculate the affinity of 
prospective ligands based on the known three- 
dimensional structure of the therapeutic tar- 
get would allow prioritization of synthetic tar- 
gets. It would bring quantitation to the 
qualitative visualization of a potential ligand 
in the receptor site. Although this problem has 
been solved in principle, in practice, direct ap- 
plication of molecular mechanics has not yet 




3 Known Receptors 



119 



OR3 




Figure 3.23. Vancomycin- 
peptide complex used by Wil- 
liams et al. (311-315) to inves- 
tigate components of free 
energy of binding. 



proved to be a reliable indicator. The reasons 
behind this difficulty become more obvious if 
one dichotomizes the free energy of binding 
into a logical set of components. 

For example, W illi ams (311-314) used a 
vancomycin-peptide complex (Fig. 3.23) as an 
experimental system in which to evaluate the 
various contributions to binding affinity. A 
similar analysis for antibody mutants was at- 
tempted by Novotny (262). 

^^(trans + rot) ^^rotors "f conform 

+ X + AGh 

where AG(trans + rot) is the free energy associ- 
ated with translational and rotational free- 
dom of the ligand. This has an adverse effect 
cn binding of 50-70 kJ/mol (12-17 kcal/mol) 
at room temperature for ligands of 100-300 
Da, assuming complete loss of relative trans- 
lational and rotational freedom. AG^-otors is the 
ftee energy associated with the number of ro- 
tational degrees of freedom frozen. This is 5-6 
kJ/mol (1.2- 1.6 kcal/mol) per rotatable bond, 
assuming complete loss of rotational freedom. 

Affconform is thc Strain energy introduced by 
complex formation (deformation in bond 
lengths, bond angles, torsional angles, etc. 
fiom solution states); X AG, is the sum of in- 
teraction free energies between polar groups; 



is the energy derived from enhanced 
van der Waals interactions in complex; and 
AGh is the free energy attributed to the hydro- 
phobic effect (0. 1 25 kJ/mol per of hydrocar- 

bon surface removed from solvent by complex 
formation). 

Through use of this analysis on the dipep- 
tide-vancomycin system, estimates of the con- 
tribution of the hydrogen bonds to binding 
were made (312) that were considerably 
higher (—24 kJ/mol, -6 kcal/mol) than those 
derived experimentally. The most likely 
source of error is the assumption of complete 
loss of relative and internal entropy upon 
binding. In retrospect, Searle and Williams 
(313) examined the thermodynamics of subli- 
mation of organic compounds without inter- 
nal rotors, and showed that only 40-70% of 
theoretical entropy loss occurs on crystalliza- 
tion. This provides an estimate of the entropy 
loss to be expected on drug-ligand interaction. 
Applying this correction to the peptide-vanco- 
mycin system led (314) to a more conventional 
view of the hydrogen bond of between -2 and 
-8 kJ/mol (0. 5-2.0 kcal/mol). Because several 
of the components in the binding energy esti- 
mate are directly related to the degree of order 
of the system (entropy), simulations in solvent 
may be necessary to quantitate the degree by 
which the relative motions of the ligand and 
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protein are quenched and the restriction on 
rotational degrees of freedom upon complex- 
ation. Aqvist (316, 317) developed the linear 
interaction energy (LIE) method for calculat- 
ing the ligand-binding free energies from mo- 
lecular dynamics simulations. Verkhivker et 
al. (318) developed a hierarchical computa- 
tional approach to structure and affinity pre- 
diction in which dynamics is combined with a 
simplified, knowledge-based energy function. 
Despite the focus on short peptides interacting 
with the SH2 domain with exhaustive calori- 
metric determination of binding entropy, en- 
thalpy, and heat capacity changes, the overall 
correlation between computed and experimen- 
tal binding affinity remained rather modest. 

3.4.2 Binding Energetics and Compari- 
sons. Because of the difficulties in calculating 
binding free energies (see below), attempts to 
use AH as a means of correlation with binding 
affinities have often appeared in the litera- 
ture, sometimes meeting with considerable 
success. These successes, however, are fortu- 
itous and depend on simplifying assumptions 
as well as the well-known correlation (3 1 9) be- 
tween /H and AG, which has been suggested 
as an unusual property of the solvent water. A 
similar correlation has been observed in non- 
aqueous systems and relates to higher entropy 
loss associated with stronger enthalpic inter- 
actions (3 13). It is a common assumption with 
congeneric series that the desolvation ener- 
gies and entropic effects will be approximately 
the same across members of the series. This, 
often tacit, assumption may hold for most of 
the series, but complex formation is depen- 
dent on the total energetics of the complex, 
and what may appear a relatively innocuous 
change in a substituent may trigger a different 
binding mode in which the ligand has reori- 
ented. This will likely have an impact on de- 
solvation as well as entropic effects, in that the 
interactions of the majority of the ligand have 
changed environment. 

3.4.3 Atom-Pair Interaction Potentials. Af- 
finities can be calculated based on ligand-re- 
ceptor atom-pair interaction potentials that 
are statistical in nature rather than empirical. 
Muegge and Martin (320) derived these poten- 
tials from crystallographic data in the Protein 



Data Bank, drawing on hundreds or thou- 
sands of examples of each interaction type. 
Grzybowski et al. (321) combined a knowl- 
edge-based potential with a Monte Carlo 
growth algorithm that generated a very potent 
inhibitor of human carbonic anhydrase (322). 
The resulting equation for all the atom-pair 
interactions in a protein-ligand complex can 
yield free energies directly, given that solva- 
tion and entropic terms are treated implicitly. 

3,4.4 Simulations and the Thermodynamic 
Cycle. Given a known structure of a drug-re- 
ceptor complex with a measured affinity of the 
ligand, the thermodynamic cycle paradigm al- 

lows caloMioTv of the difference in affinity 

(AAG) with a novel ligand. Bash et al. (136) 
successfully calculated the effect of changing a 
phosphoramidate group (P-NH) to a phos- 
phate ester (P-0) in transition-state analog 
inhibitors of thermolysin (Fig. 3.24). The dif- 
ference in free energy between a benzenesul- 
fonamide and itsp-chloro derivative as an in- 
hibitor of carbonic anhydrase has been 
calculated (323) as well. This is similar to the 
original application to enzyme-ligand work on 
benzamidine inhibitors of trypsin, in which 
the mutation of a proton to a fluorine was cal- 
culated (324). Hansen and Kollman (325) cal- 
culated differences in the free energy of bind- 
ing of an inhibitor of adenosine deaminase as 
one changes a proton to a hydroxyl group by 
use of a model of the active site. Other exam- 
ples (326-328) looked at the difference in 
binding of two stereoisomers of a transition- 
state inhibitor of HIV protease (Fig. 3.25) and 
the affinity of DHFR for methotrexate analogs 
(329). One obvious conclusion can be drawn: 
successful applications in the literature deal 
with relatively minor perturbations to a struc- 
ture where there is less chance that the bind- 
ing mode might be altered. 

There is at least one example in the litera- 
ture (330) in which the calculated affinity dif- 
ference did not agree with the experimental 
date [binding of an antiviral agent to human 
rhino virus HRV-14 and to a mutant virus in 
which a valine was mutated to a leucine (Fig. 
3.26)J. Here a j3-branched amino acid (Val) 
was converted into Leu, which lacks the iso- 
propyl side chain adjacent to the peptide back- 
bone besides the addition of a methyl group. 
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Figure 3.24. Calculated (136) difference in af- 
finity (AAG) compared with experimental value 
for two inhibitors of thermolysin. 



The differences between calculation and ex- 
perimental data may be related to rotational 
isomerism of the side chains that can be ex- 
plicitly included (331). Despite the successful 
examples of this approach that appear in the 
literature, there exists a growing healthy 
skepticism regarding its general application. 
In a discussion (332)of the application of sim- 
ulations to prediction of the changes in protein 
stability attributed to amino acid mutation, 
problems in adequate sampling, particularly 
of the unfolded state, as well as difficulties 



with electrostatics were cited. A review of ap- 
plications by Kollman (134) cites numerous 
other examples. 

3.4.5 Multiple Binding Modes. Realisti- 
cally, congeneric series that can be a useful 
construct exist only in the mind of the medic- 
inal chemist. The orientation of the drug in 
the active site depends on a multitude of inter- 
actions and a minor perturbation in structure 
can destabilize the predominant binding mode . 
in favor of another. As examples, detailed 







Roche 





Figure 3.25. Structures cf JG-365 and 
Ro 31-8959 in which chirality at crucial 
transition- state hydroxyl is reversed for 
optimal binding in the two analogs. An 
alteration in binding mode was predicted 
(333) to explain this observation that was 
subsequently confirmed by crystallogra- 
phy. 
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Figure 3.26. Calculated (330) 
relative affinity of a Sterling- 
Winthrop antiviral that binds to 
rhinovirus coat protein (HRV-14) 
and to the V188L mutant. Biolog- 
ical data indicate that V188L mu- 
tation drastically diminishes ac- 
tivity of the antiviral. 
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analyses of the multiple binding modes shown 
with thyroxine analogs (334) by transthyretin, 
a transport protein, and enkephalin analogs 
(335) by an FAB fragment have been made 
through crystallography. For this reason, the 
probability of correct answers with thermody- 
namic integration studies is directly related to 
the similarity in structure between the ligand 
of interest and the reference compound. AU 
three-dimensional methods for predicting af- 
finity require a fundamental assumption 
about the binding mode (in other words, an 
orientation rule for aligning compounds in the 
model). Examination of series of ligands bind- 
ing to the same site usually includes examples 
of similar compounds that have different bind- 
ing modes [e.g., the change in orientation (Fig. 
3.25) of the C-terminal portion of the Roche 
HIV protease inhibitor compared with 
JG-365] (333). Molecular modeling is cur- 
rently capable of distinguishing correctly in 
many cases between alternate binding modes 
of the same ligand. Many components (desol- 
vation, entropy of binding, etc. of the ligand), 
which cloud the issue of direct calculation of 
affinities are constant when comparing bind- 
ing modes of the same compound and, there- 
fore, do not have to be evaluated. The compu- 
tational costs of exploring possible binding 
modes within the active site is nontrivial, how- 
ever, especially when the protein is capable of 
reorganizing to expose alternative sites, as 
was the case for a series of ligands for hemo- 
globin (181). 

In a similar fashion, it is generally assumed 
from the competitive behavior for binding 
shown by many agonists and antagonists that 



they bind at the same site on the receptor (cer- 
tainly, the simplest hypothesis). Recent stud- 
ies on G-protein-coupled receptors indicates 
that agonists and antagonists often have dif- 
ferent binding sites, given that mutations in 
the receptor can affect the binding of one and 
not the other. An example of such a study on 
the angiotensin II receptor has been published 
(336). This story is only beginning to unfold, 
but appears to be a general phenomenon in 
G-protein receptors (337, 338). Examples of 
this phenomenon have been reported with an- 
tagonists derived from screening where the 
structure of antagonist and agonist differ dra- 
matically, but also where the antagonists were 
obtained by minor structural modification of 
the natural agonist. 

3.5 Protein Structure Prediction 

Prediction methods for generating the 3D 
structure of a protein based on its sequence 
alone fall into several categories. There are 
hierarchical methods that predict secondary 
structures and then attempt to fold those ele- 
ments together. There are simulation meth- 
ods that attempt to fold the protein through 
the use of models of reduced complexity and 
then refine the prediction by using them to 
constrain all-atom models. Additionally, there 
are hybrids of these approaches that rely 
heavily on heuristics. These methods have 
been successful in limited cases in the hands of 
their authors, but have generally been found 
lacking when tested by others in a more thor- 
ough and objective manner. Nevertheless, 
partial successes indicate that signal has be- 
gun to emerge from the smoke and mirrors. 
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3.5.1 Homology Modeling. Often, the crys- 
tal structure of the therapeutic target is not 
available, but the three-dimensional structure 
of a homologous protein wiU have been deter- 
mined. Depending on the degree of homology 
between the two proteins, it may be useful to 
model-build the structure of the unknown 
protein based on the known structure. Many 
models (339-341) of the various G-protein- 
coupled receptors have been built based on ho- 
mology with bacterial rhodopsin. Models of 
the three-dimensional structures of human 
rennin (342)and HIV protease (343,344) were 
built from crystal structures of homologous 
aspartyl proteinases as aids to drug design. 
The known structures of serine proteases 
have served as templates for models of phos- 
pholipase A2 (345) and convertases or subti- 
lases (346). The crystal structure of the MHC 
class I receptor served to generate a hypothet- 
ical model of the foreign antigen-binding site 
of Class II histocompatibility molecules (347). 
Models of human cytochrome P450s have 
been built by homology as well (348). 

One of the major difficulties facing con- 
struction of such models is the alignment 
problem that is compounded by multiple in- 
sertions and/or deletions. As the number of 
known homologous sequences increases, the 
ahgnment problem is lessened by consensus 
criteria. Although the interior core of the pro- 
teins is often quite similar, significant alter- 
ations can occur on surface loops, and much 
effort has been expended to fold these loops 
(123,349). With regard to the utility of such 
models in drug design, one can expect that 
they win prove useful conceptually, but that 
the molecular details required for optimizing 
specificity, for example, would be deficient. 
One tries to exploit the often subtle differ- 
ences that arise from sequence changes, which 
are reflected in the three-dimensional struc- 
ture. Models built by homology would be ex- 
pected to be weakest in those areas in which 
sequence differences were greatest. 

3.5.2 Inverse Folding and Threading (350- 
353). This is the ultimate in motif recogni- 
tion. One makes use of the ever-increasing da- 
tabase of known three-dimensional structures 
to generate a set of 3D folding motifs for pro- 
teins. The sequence of an unknown structure 



is systematically forced to adopt the coordi- 
nates of overlapping segments of the 3D motif 
and its energy evaluated. In essence, the local 
multibodied interactions induced by the 3D 
constraints are evaluted with an empirical 
pseudopotential that has been calibrated on 
the PDB database (354,355) and that is capa- 
ble of returning a low energy for native se- 
quences compared with scrambled sequences 
or protein with other 3D structures. If one 
cannot discriminate native structures from 
other folding motifs, then there is little chance 
that an unknown sequence, which folds in a 
similar 3D pattern, would be discriminated. 
The basic assumption is that 3D homology ex- 
ists between the test sequence and some se- 
quence represented in the motif database. 
This is not necessarily true, inasmuch as 
many as 40% of the new structures by crystal- 
lography determined have no known 3D ho- 
mologs. In fact, in an analysis of the genomes 
of several sequenced microorganisms (356), no 
more than 12% of the deduced proteins had 
detectable homology with proteins of known 
structure. In the GASP competition, however, 
the most predictive success has been with this 
approach when a 3D homology existed. 

One interesting question that arises is an 
estimate of the number of protein motifs that 
exist. One way to approximate this is to as; 
sume random sampling of protein motif space 
and then analyze the frequency of new motifs 
in new crystal structures that leads to a num- 
ber of approximately 1500 folds (357). Of 
course, such an estimate is always biased by 
size of protein, ease of crystallization, abun- 
dance, and so forth. Lattice approaches give a 
maximal estimate of 4000 folds (358). Over 
1000 protein structures are known with ap- 
proximately 120 folds (351). 

At a more local level, proteins are gener- 
ated from a set of architectural building 
blocks, helices, sheets, turns, and so forth. If 
one can accurately determine the location of 
these structural elements within a sequence, 
then the difficulty of assembly of these com- 
ponents is significantly easier because the 
degrees of freedom have been drastically re- 
duced. Unfortunately, our ability to accu- 
rately determine these elements of secondary 
structure seems to have peaked at the 75% 
accuracy level (359, 360). 
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LINUS. LINUS (Local Independent Nucle- 
ating Units of Structure) (361) is an imple- 
mentation of a hierarchical folding model in 
which protein sequences are subdivided into 
overlapping 50-residue fragments to assess 
the algorithm effectiveness in predicting 
short- and medium-range interaction as well 
as to limit computational complexity. The al- 
gorithms accumulate favorable structures 
within a sequence window, and repeat the pro- 
cess as the window is allowed to grow over the 
sequence. Obviously, this is an embodiment of 
the principle of hierarchical condensation of 
local initiation of folding. At the beginning, 
the segment length is six and the starting con- 
formation set to all extended backbone. Start- 
ing at the N-terminus of the segment, three- 
residue subfragments are perturbed with 
backbone torsional values from a library to 
give a trial conformation. If two atoms over- 
lap, the trial conformation is rejected. Other- 
wise, the energy is evaluated and selection de- 
pends on the Metropolis criterion. For each 
interaction cycle, 6000 iterations of this proce- 
dure are performed, 1000 iterations for equi- 
librium and 5000 samples. Conformations of 
chain segments that give a high frequency in 
the sample are frozen and the segment size 
increased. Backbone atoms and highly simpli- 
fied side chains are used in the simulations. 
The simplified energy function has a vdW 
term, a hydrogen-bonding term, and a back- 
bone torsional term. 

Given the arbitrary fragmentation of the 
protein for computational efficiency, the pre- 
dicted secondary structures were surprisingly 
accurate for the five cases examined, with he- 
lical and sheet boundaries within two residues 
of their corresponding native structures. Nev- 
ertheless, the rms differences were rather 
large, from 3 to 9 A. Certainly, these results 
are quite encouraging and confirm the ideas 
from studies on lattices by DiU (362) and oth- 
ers that much of the secondary structure is 
encoded into local patterns of hydrophobic 
and polar residues. 

GEOCORE (363). Amino acids are repre- 
sented at the united atom level with explicit 
polar hydrogens with slightly reduced vdW ra- 
dii. The approach uses a discrete set of ■vjf 
values for each residue type: Gly has six. Pro 
has three, and most others have four or five 



values. A contact between nonpolar atoms 
(carbon or sulfur) is worth -0.7 kcal/mol at 
closest contact and scaled down from there. 
Buried non-hydrogen-bonding groups get a 
penalty of 1.5 kceil/moL Polar conflicts in 
which two donors or two acceptors are in con- 
tact are given a similar penalty. Constraint- 
based exhaustive search is used (systematic 
search with limits such that no steric overlap 
is allowed and that a compact structure is gen- 
erated), a branch-and-bound method that 
guarantees that all globally or near-globally 
optimal conformations will be found, while ne- 
glecting less important conformations. The 
compact structure is guaranteed by a volume 
constraint about 60% higher than the volume 
of a native protein of the same size. Side 
chains are introduced in their most populated 
rotameric state from the PBD and only 
changed to an alternate rotamer to avoid a 
vdW contact. Four proteins were used to test 
the approach, avian pancreatic polypeptide 
(IPPT), crambin (ICRN), melittin (2MLT), 
and apamin (18 residues). Some 190 million 
conformations were generated for IPPT, with 
8217 having an energy not more than 16 kcal/ 
mol above the optimum found. The conforma- 
tion with the lowest rms to the native struc- 
ture was within the 100 lowest energy 
conformations found, but the true native 
structure had a lower energy by use of the 
same energy function than that of any con- 
former found by 3-10%. This implies that the 
major problem was conformational sampling, 
not just an oversimplified potential function. 

Genetic Algorithm. Le Grand and Merz 
(364) applied the genetic algorithm to a model 
of proteins using a rotamer library and the 
AMBER potential function. In a second study, 
they used a fragment library and a knowledge- 
based potential function. Sun (365) used a 
fragment library consisting of di- to pentapep- 
tides and the Sippl potential. He predicted 
the structures of mellitin, avian pancreatic 
polypeptide, and apamin (both fragments 
from apamin and APP were included in the 
library, so it is not so surprising that the rms 
agreement for these two was around 1.5 A). 
Bowie and Eisenberg (366) used the genetic 
algorithm with a fragment library of from 9 to 
25 residues and their own knowledge-based 
potential. The fragment most similar to that 
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of the sequence based on 3D profiles (3 67) was 
chosen. They were able to fold 50-residue frag- 
ments to within 4.0 A based on the error in the 
distance matrix. This avoids the problem of 
embedding and generating the wrong chiral- 
ity, which reduces the error estimate. 

3.5.3 Contact Matrix. Instead of searching 
the three-dimensional coordinate space, one 
can reduce dimensionality by focusing on gen- 
erating an optimal contact map in 2D (368). 
The 3D coordinates of a correct contact map 
can be generated within 1 A rms for the carbon 
alphas by distance geometry (369) or other 
methods (370). By use of the powers of the 
contact matrix as constraints that limit the 
contact matrices to compact structures, explo- 
ration of various potential interactions be- 
tween secondary structural elements can be 
done efficiently. Because of the limited predic- 
tive ability of current secondary structure pre- 
diction paradigms, a set of plausible inputs to 
this procedure need to be generated, and the 
best structures that are derived evaluated fur- 
ther. This may be an efficient low resolution 
model builder and have some of the computa- 
tional advantages of the hydrophobic core con- 
straints used by Dill and coworkers. This ap- 
proach based on geometrical constraints was 
originally proposed by Kuntz et al. in 1976 
(371). The matrices of residue-residue con- 
tacts provide, at the very least, a significant 
partial solution to the prediction of long-range 
intersegmental contacts through a formalism 
explicitly describing the structure and some 
structure-related properties of a protein glob- 
ule in terms of matrices of residue-residue 
contacts without explicit knowledge of second- 
ary structure predictions, although they can 
be a useful source of constraints. In many 
ways, the success of this approach verifies the 
conclusions based on lattice models that sec- 
ondary structures are implicit in the pattern 
of hydrophobic and hydrophilic residues and 
the requirements of compactness. The resi- 
due-residue contact matrices have some spe- 
cial properties as mathematical objects that 
can encode the geometrical requirements of 
compactness; the knowledge of these allows 
their treatment, starting with the sequence to 
generate a contact matrix that is consistent 



with a compact structure. This is done within 
the framework of a simple and readily formal- 
ized geometric model. 

The system of intraglobular residue-resi- 
due contacts of a protein of N residues may be 
represented as an N X N matrix of the carbon- 
alphas, whose elements are ones (contact) or 
zeros (lack of contact). Any reasonable defini- 
tion of contact provides ones in the positions 
(i, i + 1) that correspond to a peptide bond 
between two adjacent residues in the se- 
quence. The same is true for the residues cor- 
responding to the pair of cysteines forming a 
disulfide bond (these data may not be available 
as input and may be used as a test of correct 
prediction). This set of contacts describes the 
sequential covalent topology and is a constant 
part of the contact matrix which does not de- 
pend on the spatial structure of the polypep- 
tide chain; however, any additional informa- 
tion on existing intraglobular contacts (e.g., 
from NMR data or disulfide linkage) can easily 
be introduced in the constant part of the 
contact matrix A: 

A^ = const. (3.1) 

The number of contacts involving a given 
residue (the coordination number of the ith 
residue) 

( 3 . 2 ) 

j 

are assumed to be approximate constants (co- 
ordination number) and are determined by a 
separate algorithm based on residue type and 
position in the sequence as well as predicted 
secondary structure. 

A very important condition of spatial con- 
sistency of any given contact system is defined 
by the relation 

bij = ^aikakj ^ c, if = 1 (3.3) 

k 

In other words, the squared matrix of A 
should have its elements not less than c at any 
position where there is a nonzero element in 
matrix A. More generally, there exists a set of 
specific constraints regulating the relation- 
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ships of A with its powers A^, A^, and so forth. 
These relations are entirely analogous with 
those known from graph theory for connectiv- 
ity (adjancency) matrices. The elements of the 
squared matrix represent the number of paths 
of length two, the cubed matrix, the number of 
paths of length three, and so forth. Finally, an 
obvious property of matrix A is its symmetry 
(for all contact definitions considered so far, if 
the ith residue is in contact with the jth, the 
jth residue is in contact with the ith, also). 

Oy = aji (3.4) 

Thus, conditions 3. 1-3.4 define the set of ma- 
trices A, that correspond to spatially consis- 
tent, compact structures of protein chains. Be- 
sides these general conditions, mainly of 
geometrical origin, any matrix A describing 
the structure of a real protein molecule should 
also possess several more specific properties 
that may be derived from studies of the gen- 
eral properties of protein structures as exem- 
phfied in the Brookhaven Protein Databank. 
The central idea of the approach is to use both 
the general and specific properties of the con- 
tact matrix and its powers for the design of a 
gain (energy, penalty) function, 4>(A), so that 
the task of determining an appropriate intra- 
globular contact matrix might be formulated 
as a problem of maximization of 4>(A), 

@(A)^ max 
A 

with respect to A under conditions 3. 1-3.4. In 
the simplest and clearest form, <1>(A) may be 
expressed in terms of the probabilities of con- 
tact between the residues of different types (or 
groups), qij. The solution of the problem pro- 
vides the most probable residue-residue con- 
tact matrix A in 

4>(A) = Y[ max, 

all contacts (3.6) 



which is the sense of the maximum likelihood 
principle. This condition may be rewritten in 
the form 



@'(A)= y lng« 

all contacts 

— ^ max. 

^ (3.7) 

It is clear that proper formulation and param- 
eterization of this problem need the analysis 
of the voluminous experimental data on pro- 
tein structure to derive the specific properties 
to be emulated. 

This methodology has been used to predict 
the structure of loops of helical-bundle pro- 
teins, given the positions of the connection to 
the helices (372). Because of the uncertainties 
in secondary structure predictions that are 
used as inputs to constrain the search, any 
single prediction of the method must be 
viewed with skepticism. Development of scor- 
ing functions that discriminate between alter- 
native models at the Ca level of resolution 
would complement this approach. 

Distance Geometry. Aszodi et al. (373-375) 
explored the use of distance geometry as the 
metric for comparative modeling of struc- 
tures. In the CASP2 target set, the methods 
generated an overall Ca rmsd of 1.85 A for 
glutathione transferase based on close ho- 
mologs with known structure. It had more dif- 
ficulty with PNSl and built models based on 
two different proteins. The correct fold was 
not obvious based on the CHARMM energy 
values for the two models. 

Neural Networks. PROBE (376) is an inte- 
grated suite of neural network modules that 
predicts folding motif, secondary structure per 
residue, location of disulfide bonds, and sur- 
face accessibility of each residue. No critical 
assessment of the accuracy of the results from 
this package was given in the description, but 
is available for evaluation. 

Discrimination Between Folds. Because of 
the inherent error in potential functions, sec- 
ondary structure prediction methods, limited 
sampling, and so forth, one can anticipate that 
prediction of a variety of alternative struc- 
tures (perhaps, by several methods) would be 
more likely to generate a correctly folded 
structure than any single prediction. The 
problem then becomes one of discriminating 
between the correct structure and alterna- 
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lives that may be very similar in overall qual- 
ity cf fold. Park et al. (377) evaluated the abil- 
ity cf 18 low and medium resolution energy 
functions to discriminate correct from incor- 
rect folds. Functions that were effective in 
protein threading were not competitive in dis- 
criminating the X-ray structure from ensem- 
bles of plausible structures, and vice versa. 
Obviously, these empirical functions have 
been derived to optimize their discriminate 
abilities for a given problem class and the 
training (selection) sets were different. In 
other words, the true physics has not been 
captured by any of the methods. Crippen (378) 
also raised serious doubts concerning the abil- 
ity cf "empirical" energy functions to identify 
correctly folded structures based on studies 
with simple lattice models. Thomas and Dill 
(379) described an iterative approach EN- 
ERGI to generate pairwise residue "energy" 
scores from the PDB . This is one alternative to 
the Boltzmann-based pairing frequency anal- 
ysis used by others (3 80). The assumption that 
pairing frequencies are independent is not 
true based on lattice simulation and, there- 
fore, the underlying assumption of the Boltz- 
mann approach is flawed. The study that used 
two different sets of proteins to thread was 
able to classify 88%of 121 proteins having less 
than 25% homology and no homologs in the 
training set. The method appears to separate 
interactive free energies from chain configura- 
tional entropies and thus give a more realistic 
estimate. 



4 UNKNOWN RECEPTORS 

Until recently, receptors were hypothetical 
macromolecules whose existence was postu- 
lated on the basis of pharmacological experi- 
ments. Although recent advances in molecular 
biology have led to cloning and expression of 
many of those receptors whose existence was 
postulated as well as a plethora of subtypes, 
progress in most cases in defining their three- 
dimensional structure has yet to provide the 
medicinal chemist with the necessary atomic 
detail to design novel compounds. Without de- 
tailed information about the three-dimen- 
sional nature of the receptor, conventional 
computationally based approaches, such as 



molecular dynamics and the Monte Carlo 
method, are not possible. One can only at- 
tempt to deduce an operational model of the 
receptor that gives a consistent explanation of 
the known data and, ideally, provides predic- 
tive value when considering new compounds 
for synthesis and biological testing. The utility 
of such an approach has been demonstrated by 
Bures et al. (235), who used the pharmaco- 
phoric pattern derived for the plant hormone 
auxin, to find four novel classes of active com- 
pounds by searching a corporate three-dimen- 
sional database of structures. In many ways, 
the approach that has evolved is analogous to 
the American parlor game of 20 questions, in 
which the medicinal chemist poses the ques- 
tions in terms of novel three-dimensional 
chemical structures and attempts to interpret 
the response of the receptor in a consistent 
manner. The underlying hypothesis is a struc- 
tural complementarity between the receptor 
and compounds that bind. In the same way 
that the receptor's existence could be deduced 
based on pharmacological data, some low res- 
olution three-dimensional schematic of the re- 
ceptor, at least with regard to the active site or 
binding pocket, can be deduced by analysis of 
structure-activity data. It is the purpose of 
this section to summarize the current ap- 
proaches in use for receptors of unknown 
three-dimensional structure and evaluate 
their utility. For purposes of this section, re- 
ceptor is often used in a completely generic 
sense, including enzymes and DNA, for exam- 
ple, as the macromolecular component (i.e., 
binding site) of recognition of biologically ac- 
tive small molecules. 

4.1 Pharmacophore versus Binding-Site 
Models 

4.1.1 Pharmacophore Models. It is often 
useful to assume that the receptor site is rigid 
and that structurally different drugs bind in 
conformations that present a similar steric 
and electronic pattern, the pharmacophore. 
Most drugs, because of inherent conforma- 
tional freedom, are capable of presenting a 
multitude of three-dimensional patterns to a 
receptor. The pharmacophoric assumption led 
to a problem statement that logically is com- 
posed of two processes. First is the determina- 
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Figfure 3.27. (a) Pharmacophore hypothesis with correspondence of functional groups in drugs, A = 
A', B = B', C = C. (b) Binding-site hypothesis by use of drugs with hypothetical binding sites 
attached (X, Y, and Z overlap). 



tion, by chemical modification and biological 
testing, of the relative importance of different 
functional groups in the drug to receptor rec- 
ognition. This can give some indication of the 
nature of the functional groups in the receptor 
that are responsible for binding of the set of 
drugs. Second, a hypothesis is proposed (Fig. 
3.27) concerning correspondence, either be- 
tween functional groups (pharmacophore) in 
different congeneric series of the drug or be- 
tween recognition site points postulated to ex- 
ist within the receptor (binding- site model). 

The intellectual framework for use of 
structure-activity data to extrapolate infor- 
mation regarding the ligand's partner, the re- 
ceptor, is the concept of the pharmacophore. 
The pharmacophore, a concept introduced by 
Ehrlich at the turn of the 20th century, is the 
critical three-dimensional arrangement of mo- 
lecular fragments (or distribution of electron 
density) that is recognized by the receptor 
and, in the case of agonists, that causes subse- 
quent activation of the receptor upon binding. 
In other words, some parts of the molecule are 
essential for interaction, and they must be ca- 
pable of assuming a particular three-dimen- 
sional pattern that is complementary to the 
receptor to interact favorably. One corollary of 
the pharmacophoric concept is the ability to 
replace the chemical scaffold holdingthe phar- 



macophoric groups with retention of activity. 
This is the basis of the current activity (381, 
382) in peptidomimetics, in which the amide 
backbone of peptides has been replaced by 
sugar rings, steroids (383, 384), benzodiaz- 
epines (385), or carbocycles (386, 387) (Fig. 
3.28). In the pharmacophoric hypothesis, 
physical overlap of similar functional grouj)s is 
assumed; that is, the carboxyl group from 
compound A physically overlaps with the cor- 
responding carboxyl group from compound B 
and with the bioisosterictetrazole ring of com- 
pound C. 

One caveat that must be remembered is the 
probability of alternate, or multiple, binding 
modes. The interaction of a ligand with a bind- 
ing site depends on the free energy of binding, 
a complex interaction with both entropic and 
enthalpic components. Simple modifications 
in structure may favor one of several nearly 
energetically equivalent modes of interaction 
with the receptor, and change the correspon- 
dence between functional groups that has pre- 
viously been assumed and supported by exper- 
imental data. Changes in binding mode of an 
antibody FAB fragment to progesterone and 
its analogs have been shown by crystallogra- 
phy (390,391) of the complexes. For this rea- 
son, analysis of agonists as a class is usually 
preferred, given that the necessity to both 
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(a) 




: Tyr-Gly-Gly-Phe-Leu-OH 
(Enkephalin) 



/ \ 




/■ 




= -Arg-Gly-Asp- 
(RGD) 





Figure 3.28. Peptidomimetics that have heen designed based on iterative introduction of con- 
straints into parent peptide and hypotheses concerning receptor-hound conformation. Enkephalin 
mimetic (388), RGD platelet GPIIb/IIIa receptor antagonists (384, 385), thyroliherin [TRH (387)], 
and somatostatin (383,389) For an overview of recent approaches to peptidomimetic design, see the 
review hy Bursavichand Rich (382). 
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bind and trigger a subsequent transduction 
event is more restrictive than the simple re- 
quirement for binding shared by antagonists 
(336). Compounds that clearly are inconsis- 
tent with models derived from large amounts 
of structure-activity data may be indicative of 
such changes in binding mode, and may re- 
quire a separate structure-activity study to 
characterize their interaction. Despite its lim- 
itations, the pharmacophore approach is often 
the most appropriate because of lack of de- 
tailed information regarding the receptor and 
can yield useful insights, as seen in the case of 
clinical success with tyrosine kinase inhibitors 
(392,393) and other recent examples (394). 

4.1.2 Binding-Site Models. One major defi- 
ciency in the approach described above is the 
requirement for overlap of functional groups 
in accord with the pharmacophoric hypothe- 
sis. Although it is true that molecules having 
functional groups that show three-dimen- 
sional correspondence can interact with the 
same site, it is also true that a particular ge- 
ometry associated with one site is capable of 
interacting with equal affinity with a variety 
of orientations of the same functional groups. 
One has only to consider the cone of nearly 
equal energetic arrangements of a hydrogen- 
bond donor and acceptor to realize the prob- 
lem. Sufficient examples from crystal struc- 
tures of drug-enzyme complexes and from 
theoretical simulation of binding compel the 
realization that the pharmacophore is a limit- 
ing assumption. Clearly, the observed binding 
mode in a complex represents the optimal po- 
sition of the ligand in an asymmetric force 
field created by the receptor that is subject to 
perturbation from solvation and entropic con- 
siderations. Less restrictive is the assumption 
that the receptor-binding site remains rela- 
tively fixed in geometry when binding the se- 
ries of compounds under study. Experimental 
support for such a hypothesis can be found in 
crystal structures of enzyme-inhibitor com- 
plexes, where the enzyme presents essentially 
the same conformation, despite large varia- 
tions in inhibitor structures; studies of HIV-1 
protease complexed with diverse inhibitors 
support this view (171). 

In recent years, therefore, there has been 
an increasing effort to focus on the groups of 



the receptor that interact with ligands as be- 
ing the common features for recognition of a 
set of analogs. When pharmacophore and 
binding-site hypotheses are compared, the 
binding- site model is physicochemically more 
plausible, in that overlap of functional groups 
in binding to a receptor is more restrictive 
than assuming the site remains relatively 
fixed when binding different ligands. How- 
ever, the number of degrees of freedom in 
binding-site hypotheses, represented by the 
necessary addition of virtual bonds between 
groups A and X, B and Y, and C and Z in Fig. 
3.27, is greater. Additional degrees of freedom 
complicate subsequent conformational analy- 
ses and may preclude any conclusions unless a 
sufficiently diverse set of compounds is 
available. 

Other approaches to this problem have em- 
phasized comparison of molecular properties 
rather than atom correspondences. Kato et al. 
(395) developed a program that allows con- 
struction of a receptor cavity around a mole- 
cule emphasizing the electrostatic and hydro- 
gen-bonding capabilities. Other molecules can 
then be fit within the cavity to align them. 
This is similar in concept to the field-fit tech- 
niques available in the CoMFA module of 
SYBYL, in which the molecular field (electro- 
static and steric) surrounding a selected njol- 
ecule becomes the objective criterion for align- 
ment of subsequent molecules for analysis. An 
example emphasizing molecular properties in 
pharmacophoric analysis was given by Moos et 
al. (396) on inhibitors of cAMP phosphodies- 
terase II. 

4.1.3 Molecular Extensions. If we assume 
the binding-site points remain fixed and can 
augment our drug with appropriate molecular 
extensions that include the binding site (i.e., a 
hydrogen-bond donor correctly positioned 
next to an acceptor), we can then examine the 
set of possible geometrical orientations of site 
points to see whether one is capable of binding 
all the ligands. Here, the basic assumption of 
rigid site points is more reasonable, at least for 
enzymes that have evolved to catalyze reac- 
tions and must, therefore, position critical 
groups in a specific three-dimensional ar- 
rangement to create the correct electronic en- 
vironment for catalysis. The program checks 
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F^une 3.29. The use of aetive-site models in the 
Active Analog Approaeh. The strueture shown is one 
of a series cf ACE inhibitors analyzed. The thiek 
gray lines are noneovalent interaetions between the 
inhibitor and aetive-site points in the enzyme. The 
dashed lines eorrespond to the six interatomie dis- 
tances monitored for eaeh cf the inhibitors ana- 
lyzed. 

this hypothesis by determining whether one 
cr more geometrical arrangements of the pos- 
tulated groups of site points is common to the 
set of active compounds. Such a geometrical 
arrangement of receptor groups becomes a 
candidate binding-site model, which can be 
evaluated for predictive merit. 

In the study of the active site of angiotensin 
converting enzyme (ACE) by Mayer et al. 
(397), abindingsite model (Fig. 3.29) was used 
by incorporating the active-site components 
as parts of each compound undergoing analy- 
sis. As an example, the sulfhydryl portion of 
captopril was extended to include a zinc bound 
at the experimentally optimal bond length and 
bmd angle for zinc-sulfur complexes (Fig. 
3.29). The orientation map (OMAP) (398), 
wfdch is a multidimensional representation of 
the interatomic distances between pharma- 
cophoric groups (Fig. 3.30), was based on the 
distances between binding-site points such as 
the zinc atom with the introduction of more 
degrees of torsional freedom to accommodate 




F^ure 3.30. Distances used in five-dimensional 
OMAP used in analysis of ACE inhibitors. 



the possible positioning of the zinc relative to 
ACE inhibitors such as captopril. Analyses of 
nearly 30 different chemical classes (Fig. 3.31) 
of ACE inhibitors led to a unique arrangement 
of the components of the active site postulated 
to be responsible for binding of the inhibitors. 
The displacement of the zinc atom in ACE to a 
location more distant from the carboxyl-bind- 
ing Arg seen in carboxypeptidase Ais compat- 
ible with the fact that ACE cleaves dipeptides 
from the C-terminus of peptides, whereas car- 
boxypeptidase A cleaves single amino acid 
residues. 

Visualization of the OMAP is useful to 
judge the additional information introduced 
as each new compound is added (Fig. 3.32). 
Computationally, it is much more efficient to 
treat the set of noncongeneric compounds si- 
multaneously (111,399), as we shall see, but 
reassuring when identical results are obtained 
if one uses the sequential procedure introduc- 
ing each molecule in turn, where intermediate 
results may be visually verified. The use of 
computer graphics to confirm intermediate 
processing of data in convenient display 
modes becomes increasingly more important 
as the individual computations and numbers 
of molecules under consideration increase. 

4.1.4 Activity versus Affinity. Given a con- 
sistent model of either type, a limitation is 
that one can only ask whether the compound 
under consideration can present the three-di- 
mensional electronic pattern (pharmaco- 
phore) that is the current candidate. In other 
words, one is limited to predicting the pres- 
ence or absence of activity, a binary choice. 
Even the presence of the appropriate pattern 
is insufficient to ensure biological activity. For 
example, competition with the receptor for oc- 
cupied space by other parts of the molecule 
can inhibit binding and preclude activity. We 
can thus postulate the following conditions for 
activity: 

1. The compound must be metabolically sta- 
ble and capable of transport to the site for 
receptor interaction (interpretation of in- 
active compounds may be flawed by prob- 
lems with bioavailability). 
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Figure 3.3L. (Continued.) 



Once these conditions are met, we can at- 
tempt to deal with the potency, or binding af- 
finity. This belongs to the domain of three- 

dimensional quantitative structure-activity 
relationships (3D-QSA^s) (400) and we illus- 
trate the use of a particular variant, CoMFA 
(187, 401), on ACE inhibitors at the end of this 
chapter. Condition 3.3 allows us to utilize 
compounds capable of presenting the pharma- 



cophoric pattern, but incapable of binding, to 
help determine the location of receptor-occu- 
pied space in relation to the pharmacophore 
(receptor-mapping) (402). This allows a crude, 
low resolution map of the position of the recep- 
tor relative to the pharmacophoric elements 
and indicates in which directions chemical 
modifications may be productive. 

The number and diversity of compounds 
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Figure 3.32. Change in OMAP (projection of three of the five dimensions) as new compounds were 
introduced to analysis of ACE inhibitors (397). Left is original OMAP of compound f (Fig. 3.30). Right 
is OMAP after completion of analysis. 



available for analysis determine the method- 
ology to be used. If there is a limited data set, 
then the pharmacophoric approach should be 
assessed first because of its fewer degrees of 
freedom. If no pharmacophoric patterns are 
consistent with the set of analogs, then intro- 
duction of logical molecular extensions to en- 
able the active-site approach is warranted. Op- 
erationally, one first determines the set of 
potential pharmacophoric patterns consistent 
with the set of active analogs [leading to its 
name of Active Analog Approach (398)]. If 
there are sufficient data, then a unique phar- 
macophore, or active-site model, may be iden- 
tifiable. The basic assumption behind efforts 
to infer properties of the receptor from a study 
of structure-activity relations of drugs that 
bind is the idea of complementarity. It follows 
that the stronger the binding affinity, the 
more likely that the drug fits the receptor cav- 
ity and aligns those functional groups that 
have specific interactions in a way comple- 
mentary to those of the receptor itself. Cer- 
tainly, our understanding of intermolecular 
interactions from studies of known complexes 
does not dissuade us of this notion, but may 
make us somewhat skeptical of the naive mod- 
els that often result from such efforts. An- 
drews et al. (403) reviewed efforts of this type 
with regard to CNS drugs. 

Clearly, the key to insight relies on chemi- 
cal modification to determine the relative im- 
portance of functional groups for molecular 
recognition. Often more subtle effects than 
the simple presence or absence of a group are 



important and then comparison of molecula: 
properties becomes of interest. A major im 
pediment to analysis is the definition of a com 
mon frame of reference by which to align mol 
ecules for comparison. This is equivalent tc 
solving the three-dimensional pharmacO' 
phoric pattern, and implies that one has dis- 
tinguished those properties of the molecule!; 
under consideration in a manner similar to 
the receptor. Initial efforts to rationalize 
structure-activity relationships (SARs) amongj 
noncongeneric systems was hampered by an 
"RMS mentality. " That is, a point of view thal 

% 

required atomic centers to align rather than 
overlap of steric and electronically similar 
grouping of atoms. An example would be re- 
quiring the six atoms of aromatic benzene 
rings to overlap at each of the six atoms of the 
ring vertices rather than simple requirements! 
for coincidence and coplanarity that would! 
recognize the torus of electron density that the 
rings share in common (Fig. 3.33). In conge- 
neric series, the difficulty in assignment oJ 
correspondence is less (nonexistent by defini- 
tion). This allows a variety of approaches, in- 
cluding those based on molecular graph the- 
ory (404-407), to detect similarities between 
molecules that can form the basis of a correla- 
tion analysis. Extrapolation outside of the 
group of congenerically related compounds on 
which the analysis was based would appear 
difficult, if not impossible. 

Although it is simpler to start an analysis 
with a congeneric series to identify the recog- 
nition elements, diversity in chemical struc- 
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Figure 3.33. Toms of electron density representing benzene ring. Atom-to-atom correspondences 
of ring atoms used in normal fitting routines lead to overconstrained fits. 



tures implies more information regarding the 
conformational requirements of the system. A 
congeneric series requires that the basic 
chemical framework of the molecule remains 
constant and that groups on the periphery are 
either modified (e.g., aromatic substitution) or 
substituted (e.g., tetrazole for carboxyl func- 
tional group). Implicit in this concept is the 
notion that the compounds bind to the recep- 
tor in a similar fashion and, therefore, the 
changes are localized and comparable for each 
position of modification. Introduction of de- 
grees of freedom in the substituents as well as 
consideration of differences in properties that 
are conformationally dependent, such as the 
electric field, require conformational analysis 
in an effort to determine the relevant confor- 
mation for comparison. 

The problem can be divided into two: what 
are the aspects of the molecules that are in 
common and that may provide the basis for 
molecular recognition, and which conforma- 
tion for each molecule is appropriate to con- 
sider. For the first problem, studies on a con- 
generic series can often yield valuable insight. 
For determination of the three-dimensional 
irrangement of the crucial recognition ele- 
nents, diversity in the chemical scaffolds im- 
loses different constraints on possible three- 



dimensional patterns and generates an 
opportunity for determining a unique solu- 
tion. 

4.2 Searching for Similarity 

4.2.1 Simple Comparisons. To gain insight 
into molecular recognition, subtle differences 
in molecules must be perceived. Comparisons 
can be divided into two categories: those that 
are independent of the orientation and posi- 
tion of the molecule and those that depend on 
a known frame of reference. Simple compari- 
sons deal with properties independent of a ref- 
erence frame. For example, the magnitude of 
the dipole moment is frame independent, but 
the dipole itself is a vectorial quantity depen- 
dent on the orientation and conformation of 
the molecule. Similarly, the bond lengths, va- 
lence angles and torsion angles, and inter- 
atomic distances are independent of orienta- 
tion. The distance matrix, composed of the set 
of interatomic distances (Fig. 3.34), is a conve- 
nient representation of molecular structure 
that is invariant to rotation and translation of 
the molecule, but which reflects changes in 

internal degrees of freedom. The distance 
range matrix is an extension (Fig. 3.34) that 
has two values for each interatomic distance 
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Figure 3.34. Distance matrix (a) in which unique 
interatomic distances for a particular conformation 
of a molecule are stored. Distance range matrix d)) 
in which ranges of interatomic distances represent- 
ing conformational flexibilty of molecule are stored. 
U = upper bound, L = lower bound. 



representing the upper and lower limits, or 
range, allowed for a given interatomic dis- 
tance arising from the conformational flexibil- 
ity of the molecule. Crippen (408) developed a 
procedure that will generate conformations 
that conform to the constraints represented 
by such a distance range matrix. This ap- 
proach is used to generate structures from ex- 
perimental measurements such as nuclear 
Overhauser effects in NMR experiments. The 
use of distance range matrices in the identifi- 
cation of pharmacophoric patterns was ini- 
tially illustrated by Marshall et al. ( 398 ) (Fig. 
3 . 35 ), and has recently been used by Clark et 
al. (409) in three-dimensional databases for 
representing the conformational flexibility of 
molecules. Pepperrell and Willett ( 410 ) exam- 
ined several techniques for comparing mole- 
cules by use of distance matrices. Other de- 
scriptors for comparison of pharmacophoric 
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Figure 3.35. Distance range 
matrices used for illustra- 
tion cf analysis of musca- 
rinic receptors (398). Used 
with permission. 
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patterns and retrieval of similar substruc- 
tures are under active investigation (411). 

4.2.2 Visualization of Molecular Properties 

(412). Although straightforward displays of 
molecular structure have proved to be ex- 
tremely useful tools that enable medicinal 
chemists to visualize molecules and to compare 
their structural properties in three dimensions, 
of even greater potential utility is the display of 
the various chemical and physical properties of 
molecules in addition to their structures. Such 
displays allow the comparison not only of molec- 
ular shapes and three-dimensional structures, 
but also cf molecular properties such as internal 
energy, electronic charge distribution, and hy- 
drophobic character. A number of different 
properties have been displayed (412) in this 
manner in an effort to gain insight into molecu- 
lar recognition in a series of compounds. 

Among the more useful properties is the 
electrostatic potential. Any distribution of 
electrostatic charge, such as the electrons and 
nuclei cf a molecule, creates an electrostatic 
potential in the surrounding space that at any 
given point represents the potential of the 
molecule for interacting with an electrostatic 
charge at that point. This potential is a very 
useful property for analyzing and predicting 
molecular reactive behavior. In particular, it 
has been shown to be an indicator of the sites 
or regions of a molecule to which an approach- 
ing electrophile or nucleophile is initially at- 
tracted or from which it is repelled (Fig. 3.36). 

The major obstacle to use of electrostatic 
potentials in the comparison of different mol- 
ecules has been the sheer volume of informa- 
tion produced. The traditional means of dis- 
playing such large amounts of data has been to 
display the electrostatic potential around a 
molecule as a two-dimensional contour map. 
The advent of computer graphics techniques 
have improved the situation by allowing 
three-dimensional contour maps to be dis- 
played in color on the graphics screen and ma- 
nip.ulated in real time along with a display of 
the molecule itself. An alternative mode for 
displaying molecular electrostatic potentials 
is to employ a dotted surface representation, 
with the dots taking on an appropriate color 
according to the electrostatic potential value 
at the relevant location. Such techniques were 




Figure 3.36. Molecular electrostatic potential for 
water. Positive potential superimposed on right sur- 
rounding hydrogens. Negative potential on left sur- 
rounding oxygen. 



initially derived to display empirically deter- 
mined potentials on the surface of proteins, 
but have since been used widely to display the 
electrostatic potentials on sets of small mole- 
cules for comparative purposes. 

Other graphical uses of the electrostatic po- 
tential have been developed by Davis et al. 
(413), who were able to graphically align cyclic 
AMP and cyclic GMP, based on the superim- 
position of their respective electrostatic poten- 
tial minima, and by Weinstein et al. (414), who 
oriented 5 -hydroxy tryptamine and 6-hydroxy- 
tryptamine based on the alignment of an elec- 
trostatically derived "orientation vector." 

In a similar procedure to that described for 
the display of electrostatic potential, Cohen 
and colleagues developed a technique whereby 
the steric field surrounding a molecule can be 
displayed on a graphics screen as a three-di- 
mensional isopotential contour map (415). 
The map is generated by calculating the VDW 
interaction energy between the molecule and 
a probe atom or molecule placed at varying 
points around the molecule of interest. This 
interaction energy is then contoured at spe- 
cific levels to give the most stable VDW con- 
tour lines around the molecule, that is, the 
contour that represents the most favorable 
steric position for the probe as it is moved 
around the target. 
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Figure 3.37. Calculation cf electrostatic and 
VDW fields surrounding a series cf molecules 
in defined orientations are used as a basis for 

3D QSAE correlations in CoMFA (187, 401). 
Used with permission. 




Equation 




Bio = y + a X S001 + b x S002 + + m x S998 + n x E001 

+ + z X E998 



A similar three-dimensional contour repre- 
sentation of a molecule can be obtained for 
both the electrostatic and steric fields of a mol- 
ecule within the comparative molecular field 
analysis (CoMFA) methodology that has been 
developed by Cramer (187) to investigate 3D- 
QSARs (400). In this procedure, the molecule 
is surrounded by a regular lattice of points, at 
each point of which a van der Waals and an 
electrostatic interaction energy between the 
molecule and a probe atom is computed (Fig. 
3.37). Isocontours can then be generated 
around individual molecules, displayed graph- 
ically, and they can be statistically compared 
throughout a series of molecules in an attempt 
to generate 3D-QSARs and hence to rational- 
ize activity data. This is very similar to the 
GRID program (186), which uses various 
probe groups (416) to map potential interac- 
tions around a molecule. Inductive logic pro- 
gramming has been combined with CoMFA to 
develop a new approach (417) to pharmaco- 
phore mapping that does not require explicit 
superimposition of compounds. 



In situations where, either from previous 
QSAR work or from experimental evidence, it 
is known or suspected that differences in the 
reactivity of a set of molecules are attributed 
primarily to their hydrophobic rather than 
their electrostatic properties; it is probably of 
more use to compare molecular surfaces that 
display hydrophobicity or polarity informa- 
tion. Indeed, dotted molecular surfaces color- 
coded by hydrophobic character have been 
used very successfully by Hansch and cowork- 
ers to rationalize QSARs from several differ- 
ent systems (418,419). This concept has been 
extended to calculate the hydrophobic field 
surrounding a molecule by Kellogg and Abra- 
ham (420,421)and utilized in CoMFAstudies. 

4.3 Molecular Comparisons 

To compare molecules in a general way, a 
means of superposition, or correctly orienting 
the molecules in the same reference frame, 
must be available. A procedure for positioning 
an atom in the molecule at the center of the 
coordinate frame with other atoms positioned 
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Figure 3.38. Construction of dummy vector per- 
pendiculsu* to plane of aromatic ring at centroid that 
allows superposition and coincidence cf aromatic 
rings by fitting endpoints (Du) of dummy vector 
without requiring superposition of ring atoms. 



along coordinate axes can be used, or the mol- 
ecules can be successively fit to one that is 
used as the standard orientation. Danziger 
and Dean (422) described an approach that 
will find geometric similarities in positions of 
hydrogen-bonded atoms between two mole- 
cules. Least-squares-fitting procedures for 
designated atoms allow selectivity in orienting 
the molecules with predetermined conforma- 
tions in the most appropriate manner. Kears- 
ley (423) described an efficient method for fit- 
ting a series of molecules when atom-atom 
associations have been previously defined be- 
tween members of the series. In some cases, 
the use of dummy atoms allows geometric su- 
perposition of groups such as aromatic rings 
without requiring superposition of the atoms 
composing the ring. By defining the centroid 
of the ring and erecting a normal to the plane 
of the ring, the dummy atom at the end of the 
nonnal and the centroid dummy atom can be 
used to superimpose the ring on another ring 
with similar dummy atoms (Fig. 3.38). This 
method leads to coincidence and coplanarity of 
the two ring systems without requiring the 
atoms composing the rings to be coincident. In 
other words, the rings can be viewed as two 
toruses of electron density without overem- 
phasizingthe positions of the atomic nuclei. In 
numerous studies [see review by Andrews et 
al. (403)1 of biogenic eunine ligands, this 
method of comparison of the aromatic ring 
components is essential to allow alignment of 
the nitrogens. 

4.3.1 Volume Mapping. One method of dis- 
playing molecular surfaces that retains the 
ability to transform the display interactively 
has been developed by Marshall and Barry 
(424). The procedure involves computing a 
molecular pseudo-electron density map on a 
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Figure 3.39. S et of parameters to generate pseudo- 
electron density maps of molecules that can be con- 
toured to approximately represent VDW surface 
(Ho and Marshall, unpublished). 



three-dimensional grid that surrounds the 
molecule whose atoms are replaced by dummy 
Gaussian atoms. Atom types are characterized 
by a half- width and an integrated density, cho- 
sen so that the Gaussians have a fixed value at 
a distance equal to the VDW radius (Fig. 3.39). 
Such density maps may be contoured in three 
dimensions to provide a chicken wire-like en- 
velope around the molecule that corresponds 
to the van der Waals surface. 

A concomitant benefit of this technique is 
that estimates of the molecular surface area 
and volume are generated as by-products of 
the contouring routines, whether the surface 
is being drawn around one or several mole- 
cules. Additionally, the generated surfaces 
and volumes are readily susceptible to logical 
operations, such as union, intersection, or 
subtraction, enabling the rapid determination 
of, for example, union or difference volumes 
among a series of molecules. 

Once one has fixed the molecules in a com- 
mon frame of reference, then comparison by a 
variety of techniques becomes feasible. As an 
example, difference in volume may be impor- 
tant in understanding the lack of seen activity 
in compounds that appear to possess all the 
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prerequisites for activity seen in others in the 
series. In a congeneric series, a significant por- 
tion of the molecular structure is common to 
the molecules under comparison. This com- 
mon volume that is shared logically should not 
contribute to differences in activity. By sub- 
traction of the volume shared by two mole- 
cules, one obtains a difference map in which 
the volume occupied by one molecule and not 
the other remains (398). Correlations between 
the shared volume and the biological activity 
of a congeneric series of inhibitors of DHFR 
have been shown by Hopfinger (425). Simon 
and his colleagues (426)emphasized the use of 
both overlapping volume and nonoverlapping 
volume in QSAR studies in a quantitative 
methodology, the minimal steric difference, or 
MTD method. This approach has been en- 
hanced to allow comparison of low energy con- 
formers of each molecule and use of those that 
are sterically most similar. An application to 
substrates of acetylcholinesterase illustrates 
this facility (427). 

4.3.2 Field Effects. Once the frame of refer- 
ence has been established, other properties of 
molecules, such as the electrostatic field, can 
be compared as well. Because the electrostatic 
properties can be sampled on a grid, differ- 
ences between the values of two molecules can 
be calculated and a difference map contoured. 
Such difference maps (428) highlight more 
clearly the similarities and differences be- 
tween molecules. Hopfinger (429) integrated 
the difference between potential fields and 
showed this parameter to be useful in QSAR 
studies. 

An approach to statistically quantifying the 
similarity between two molecular electrostatic 
potential surfaces was developed by Dean and 
coworkers (430,43 1) and by Richards and co- 
workers (215). Here, the previously deter- 
mined molecular electrostatic potential sur- 
faces are projected outward onto surrounding 
spheres that provide a common surface of ref- 
erence, and then statistical analyses are per- 
formed over the points on this common sur- 
face in an attempt to quantify the similarities 
or differences between the two molecules un- 
der consideration. Burt and Richards (432)in- 



troduced flexibility in the comparison of mol- 
ecules based on their electrostatic potential 
fields. 

4.3.3 Directionality. If one is comparing 
molecules that share interaction at a common 
site on a biological macromolecule, it is logical 
to assume that they may do so by interacting 
with similar sites in the receptor with optimal 
interaction shown by molecules with correctly 
oriented functional groups. If one does not 
have a three-dimensional model of the recep- 
tor from which to deduce potential interactive 
sites, then one can only attempt to deduce the 
potential interactive receptor-subsites by ex- 
amination of the molecules that interact with 
them. Systematically, one can vary the confor- 
mation of a molecule and record the relative 
orientation of groups postulated, or shown ex- 
perimentally, to play a dominant role in inter- 
molecular interactions. In this way, one can 
map out the directionality of interactions of 
each functional group of the ligand in a com- 
mon frame of reference. Comparison of these 
maps can often lead to hypotheses regarding 
pharmacophoric groups and their correspon- 
dence between molecules. 

4.3.4 Locus Maps. One can generate a lo- 
cus plot in coordinate space showing all the 
potential locations of one group relative to an- 
other by fixing one group in a particular orien- 
tation as a frame of reference and recording all 
possible coordinates of the other. An example 
would be the relative positions of the basic ni- 
trogen to the aromatic ring in compounds such 
as dopamine interacting with biogenic amine 
receptors. One must choose the common frag- 
ment (in the example, the aromatic ring) of 
each molecule and its orientation to generate a 
similar frame of reference, so that the locus of 
positions of the atom (the basic nitrogen) leads 
to a meaningful comparison across a series of 
molecules (Fig. 3.40). 

4.3.5 Vector Maps and Conformational 
Mimicry. Often, one is more interested in ac- 
cessing the directionality of potential interac- 
tion rather than simply looking for overlap of 
atoms such as the basic nitrogen. In this case, 
for example, one is interested in determining 
both the locus of the lone pair of the nitrogen 
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Figure 3.40. Locus of sterically allowed positions 
of nitrogen atom in dopamine relative to aromatic 
ring. 

and the nitrogen as the ordered pair of coordi- 
nates determines a vector in the chosen frame 
of reference. The resulting plot of the locus of 
all possible vectors of the nitrogen lone pair 
constitutes a vector map. The combination of 
positional information with relative orienta- 
tion offers considerable insight into potential 
interactions with a hypothetical receptor. The 
wok cf Lloyd and Andrews (233) postulating 
a common theme in CNS receptors based on 
an underlying biogenic amine pattern can be 
rationahzed using the vector-map approach. 

The use of vector maps is essential to the 
assessment of conformational mimicry, in 
that one attempts to determine the statistical 
probability that the conformationessentialfor 
activity wiU be preserved with a given chemi- 
cal modification. An example will serve to il- 
lustrate this concept and its application. Mod- 
ification cf amide bonds (introduction cf 
amide isosteres) in peptide drugs to increase 
metabolic stability may alter the potential ac- 
cessible conformations. This may preclude the 
compound containing the isostere from adopt- 
ing the correct orientation for receptor recog- 
nition and activation. In the general case, one 
has no specific information regarding which 
particular conformation is biologically rele- 
vant and can only assess whether the chemical 
modification mimics the amide bond in its con- 
formational effects. This can be quantitatively 
assessed by the comparison of the percentage 
of vectors of the vector map of the parent 
amide bond that can be found in a comparable 
vector map of the analog. 

Work by Zabrocki et al. (433) on the use of 
1,5-disubstituted tetrazole rings as surrogates 
for the cis-amide bond illustrates this applica- 





Figure 3.41. Vector map of the orientations of the 
C“-C^ bond of Ala^, with the methylamide fixed as a 
frame of reference of the dipeptide Ac-Ala-Ala-NH- 
CH, in which the central amide bond was cis (433). 
Used with permission. 



tion. The linear dipeptide, acetyl-Ala-Ala- 
methylamide, with the amide bond between 
the two alanine residues in the cis-conforma- . 
tion, and the tetrazole analog, acetyl- 
AlaT'[CN 4 ]Ala-methylamide, were modeled 
using the coordinates derived from dike- 
topiperazines for the cis-amide bond or from 
the crystal structure of the cychc tetrazole 
dipeptide. A systematic, or grid, search, which 
determines the sterically allowed conforma- 
tions by systematically varying the torsional 
degrees of freedom, was used to generate a 
Ramachandran plot for each of the pairs of 
backbone torsional angles (O, ^) associated 
with each amino acid residue. The rigid geom- 
etry approximation was used with the set of 
scaled VDW radii, shown by lijima et al. (109) 
to reproduce the experimental crystal data for 
proteins and peptides. When the cis-amide 
dipeptide model was calculated, the orienta- 
tions of the C“-C^ bond of Ala-1 with the meth- 
ylamide fixed as a frame of reference were 
recorded for each sterically allowed conforma- 
tion (Fig. 3.41). Use of the same orientation of 
the methylamide in the tetrazole allowed the 
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program to determine which vectors, or orien- 
tations of the Ala-1 side chain relative to the 
methylamide, were common to both dipep- 
tides. Alternatively, the acetyl group was used 
as the fixed frame of reference and the side- 
chain orientation of Ala-2 was used to monitor 
conformational mimicry. Because the quanti- 
tative results were essentially the same, the 
measurement of mimicry was shown to be in- 
dependent of the chosen frame of reference. A 
torsional increment of 10 degrees was used, 
and a side-chain vector was assumed to corre- 
spond if both the carbon-a and carbon-/3 were 
within 0.2 A of the coordinates of another vec- 
tor. The percentage of orientations available 
to the analog that are available to the parent is 
referred to as the conformational mimicry in- 
dex. For the tetrazole surrogate of the cis- 
amide bond, the conformational mimicry in- 
dex is 88% [the number of vectors (747) 
common to both the tetrazole and cis-amide 
divided by the total number of vectors (849) 
allowed for the cis-amide]. The tetrazole ana- 
log has more conformational freedom than the 
cis-amide model with 33,359 conformers al- 
lowed compared to 14,912 allowed for the 
cis-amide of the 36^ (or 1,679,616) possible 
conformations. This difference was easily vi- 
sualized in plots of the vector maps for the two 
dipeptides. 

A more recent example of the use of vector 
maps to evaluate conformational similarity is 
an application to j8-turn mimetics by Ballet al. 
(434,435). This led to a recognition that many 
of the various turn types described in peptides 
based on their backbone dihedral angles lead 
to quite similar topographical arrangements 
of the side chains. A new parameter, jS [the 
dihedral angle formed by the backbone atoms 
C(i)-aC( 2 )-Q!C( 3 )-N( 4 )], was described (Fig. 
3.42) that more readily facilitated comparison 
of the topography of the system. 

4.4 Finding the Common Pattern 

If one assumes that a common binding mode 
exists for two or more compounds, then one 
can use the computer to verify the geometric 
feasibility of the assumption. One needs to de- 
termine whether it is possible for the two mol- 
ecules to present a common geometric ar- 
rangement of the designated “important” 
functional groups for recognition. There are 




Ri+2 




j0-dihedral angle 

Figure 3.42. Definition of new parameter /3, the 
dihedral angle between the backbone atoms (&j- 
d&,-&(„-N(„ cf peptides, used to describe the to- 
pography of reverse turns (434,435). 

two distinct approaches to this problem. The 
first that is associated with minimization 
methodology focuses on the existence issue. Is 
there a conformation that is energetically ac- 
cessible to each of the molecules under consid- 
eration that will place the designated func- 
tional groups in a similar orientation? The 
second approach attempts to systematically 
enumerate all possible conformations and 
thereby derive all possible orientations or pat- 
terns to determine the set of patterns shared 
by the compounds under study. The latter ap- 
proach, when it can be applied, can directly 
address the question of uniqueness of the com- 
mon pattern. 

The search for the global minimum, or 
complete set of low energy minima, on a poten- 
tial surface is a common problem in science 
and engineering that does not have a general 
solution. Numerous approaches in chemistry 
have been used: most commonly stochastic 
methods such as distance geometry (408), mo- 
lecular dynamics, and Monte Carlo sampling. 
Although distance geometry and molecular 
dynamics are widely used in the elucidation of 
solution conformations from NMR data, they 
have problems in conformational sampling 
and homogeneous treatment of data from 
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Figure 3.43. Simultaneous minimiza- 
tion of molecules to force overlap of phar- 
macophoric groups A, B, and C. Springs 
represent constraints between groups and 
only interatomic forces evaluated. 



rigid and mobile domains. In general, the dif- 
ficulties with most methods are similar to 
those seen with minimization procedures. If 
one is in the area of the global minimum, then 
cxie is likely to converge to that solution. Oth- 
erwise, one will be trapped in some local min- 
imum. In contrast, systematic search methods 
are algorithmic, so that all sterically allowed 
conformations are generated at the selected 
torsional grid parameters. Systematic search 
methods, therefore, do not have problems in 
samphng and are path independent, but are 
combinatorial in complexity, which may limit 
the fineness of the sample grid and thus com- 
promise the results. Only in small systems 
such as cycloalkane rings (121) and small pep- 
tides (90, 436) have the potential energy hy- 
persurfaces been mapped. 

4.4.1 Constrained Minimization. In cases 
where one has internal degrees of freedom, 
besides the six associated with position and 
orientation, the use of constrained minimiza- 
tion procedures becomes a useful technique. 

Often the standard molecule for comparison 
has a fixed conformation and the molecule to 
befitted has internal degrees of freedom. Sev- 
I eral groups have published methods for deal- 
I ing with this problem. In case one has simul- 
taneous degrees of freedom in both the 
molecule to be fitted and the target, a different 



approach with simultaneous minimization of 
all variables is recommended (Fig. 3.43). 

The combination of molecular mechanics 
with flexible minimization routines allows 
penalty functions to be assigned to force geo- 
metrical correspondence of groups, whereas 
individual molecules have their internal en- 
ergy evaluated, but are invisible to the other 
molecules under consideration. A program has 
been described (437) with this capability and 
its use illustrated on histamine antagonists by 
Naruto et al. (438). Template forcing allows 
one molecule to be set up as a template and 
another molecule to be constrained to overlap 
in a specified manner. The strain energy in- 
volved in forcing correspondence gives an up- 
per-bound estimate of the distortion energy 
required, given that the results depend on the 
initial-problem definition. 

An alternative approach uses the distance 
geometry paradigm, in which all the con- 
straints are combined to form the distance 
matrix from which energetically feasible con- 
formations of the set of molecules are sought 
mathematically. SheridEUi et al. (439) demon- 
strated this approach on acetylcholine analogs 
that are muscarinic agonists. Both of these ap- 
proaches ask the same question and suffer 
from the same limitations, and differ only in 
computational technique. Each suffers from 
the local minima problem, in that each uses a 
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minimization technique, and the results wiU 
be dependent on the starting geometries of the 
initial set of molecules. Both have the advan- 
tage that the unique constraints imposed by 
particular molecules enter consideration at an 
early stage and minimize comparison of 
conformations. 

Another variant recently reported by 
Hodgkin et al. (440) uses a Monte Carlo search 
procedure to generate candidate pheuTnaco- 
phoric patterns. A reduced force-field parame- 
ter set is used initially to lower energy barriers 
between conformations to ensure greater con- 
figurational sampling. Candidate pharma- 
cophores are then refined to produce low en- 
ergy conformations of molecules overlaid in a 
common binding mode. Application to antag- 
onists of the human platelet-activating factor 
led to a consistent binding model for a set of 
five diverse structures when active-site hydro- 
gen-bonding groups were postulated. Barakat 
and Dean (441, 442) utilized simulated an- 
nealing to optimize structure matching by 
minimizing the difference matrix between the 
two molecules. A somewhat similar approach 
is that of Perkins and Dean (443), who used 
simulated annealing to search conformational 
space followed by cluster analysis for each 
molecule, with subsequent comparison of a 
small number of diverse conformers between 
different molecules. 

4.4.2 Systematic Search and the Active An- 
aiog Approach. Once the existence of a com- 
mon pattern has been determined, then the 
issue of uniqueness needs to be addressed. The 
Active Analog Approach (398) uses a system- 
atic search to generate the set of sterically al- 
lowed conformations based on a grid search of 
the torsional variables at a given angular in- 
crement. For each sterically allowed confor- 
mation, a set of distances between the postu- 
lated pharmacophoric groups are measured. 
The set of distances, each of which represents 
a unique pharmacophoric pattern, constitutes 
an OMAP. Each point of the OMAPis simply a 
submatrix of the distance matrix and, as such, 
is invariant to global translation and rotation 
of the molecule. If the initial assumption is 
valid, that the same binding mode of interac- 
tion, or pharmacophoric pattern, is common 
to the set of molecules under consideration. 



then the OMAP for each active molecule must 
contain the pattern encrypted in the set of dis- 
tances. By logically intersecting the set of 
OMAPs, one can determine which patterns 
are common to all molecules (444). In other 
words, all potential pharmacophoric patterns 
consistent with the activity of the set of mole- 
cules can be found by this simple manipula- 
tion of OMAPs, and the question of unique- 
ness addressed directly (Fig. 3.44). 

A good example is the work of Nelson et al. 
(445) on the receptor-bound conformation of 
morphiceptin. Based on structure-activity 
data, the tyramine portion and phenyl ring of 
residue three of morphiceptin, Tyr-Pro-Phe- 
Pro-NH„ were postulated to be the pharma- 
cophoric groups responsible for recognition 
and activation of the opioid preceptor. It was 
assumed further that the aromatic rings 
bound to the receptor in the different analogs 
were coincident and coplanar. A series of ac- 
tive analogs with a variety of conformationally 
constrained amino acid analogs in positions 
two and three were analyzed. Aunique confor- 
mation was found for the two most con- 
strained analogs that allowed overlap of the 
Phe and Tyr portions of the molecules (Fig. 
3.45). In this case, a five-dimensional orienta- 
tion map with distances between the nitrogen 
and normals to the two aromatic rings wa§ 
used in the analysis. 

The Active Analog Approach (Fig. 3.46) is 
appropriate for the unknown receptor prob- 
lem, given that no objective criteria function, 
such as 'potential energy, can be used a priori 
in the absence of information regarding the 
receptor. Adequate sampling of the potential 
surface to ensure that the complete set of local 
minima is found is still problematic because of 
the phenomenon known as "grid tyranny." 
This relates to the fact that the combinatorial 
explosion that results by decreasing the incre- 
ment of the torsion angles scanned limits one 
to a finite increment for a given problem, say, 
10° for a seven-rotatable bond problem. Be- 
cause the energetics of the system is very sen- 
sitive to interatomic distances, a conformation 
generated at the 10" increment may be steri- 
cally disallowed, but very close to a minimum. 
Relaxation of the structure might find the 
relevant conformation, for example, by al- 
lowing a torsional angle to vary by 1°. Im- 
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Figure 3.44. OMAPs generated for two molecules can be logically intersected to determine which 
three-dimensional patterns are common. 



provements in algorithms described in the 
following section have helped to overcome 
this problem. 

4.4.3 Strategic Reductions of Computa- 
tional Complexity. Logically, the Active Ana- 
log Approach can be conceived as sequentially 
determining all the sterically allowed confor- 
mations for each molecule under consider- 




\ 



Figure 3.45. Conformations of two constrained 
analogs of morphiceptin in which aromatic rings of 
Tyr^ and Phe^ are overlapped (445). 



ation, generation of an OMAP from those con- 
formations, and logical intersection of the 
OMAPs to determine the common pharma- 
cophoric patterns. A simple analysis will easily 
convince one that this is not feasible because 
of the computational complexity of the prob- 
lem. For example, the set of 28 ACE inhibitors' 
(Fig. 3.31), analyzed by Mayer et al. (397), 
have a total of 163 torsional degrees of free- 
dom that have to be explored to find a common 
pattern, as seen in Table 3.1. If we were to 
determine all possible conformations for each 
molecule at lO' torsional scan, the scan pa- 
rameter (s) = lO' and the number of torsional 
increments r = 3607s, or 36. For each mole- 
cule, there are possibilities to be examined. 
For the set of molecules there are (6 X 36^) -f 
(7 X 36®) + (3 X 36®) + (5 X 36^) + (6 x 36®) 
+ (1 X 36®) possible conformations to be gen- 
erated and examined. If one compares each 
conformation of each molecule with all the 
conformations of the other molecules to find 
possible correspondences, the combinatorials 
of the problem explode and one reaches the 
same level of complexity as a complete confor- 
mational search of a peptide of 30 residues at a 
lO'^ scan (not currently feasible). 

One is not interested in the conformational 
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Figure 3.46. The flow of information in the Active Analog Approach (1 1 l,399).Sterically allowed 
conformations (represented by fiUed circles on the o, ,^2 torsional grid) cf a molecule are determined 
and the distances (c?i, d„ etc.) between pharmacophore elements are recorded for each. The resulting 
OMAP is used to constrain the next molecule in the series. Ideally, once all of the molecules have been 
evaluated, only a single point or cluster of points remains in the OMAP. 



hyperspace of the set of the inhibitors, but 
rather the three-dimensional patterns com- 
mon to the total set of inhibitors. Many con- 
formations of a molecule often map into one 
three-dimensional pattern. Transformation of 
the multidimensional conformational hy- 
perspace in a smaller-dimensioned OMAP 
space reduces the number of objects for com- 
parison. If one starts with the most con- 
strained inhibitor (fewest torsional degrees of 
freedom) and determined an OMAP for it, 
then one can use the upper and lower distance 
bounds as constraints for searches for the next 
molecule. In other words, one looks only 
where there are possible solutions to the prob- 
lem. A more advanced approach simply exam- 



Table 3.1 Degrees of Torsional Freedom to 
Specify ACE Active Site Geometry 



Degrees of 
Freedom (^i) 


Number cf 
Molecules 


Total 
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18 
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18 
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35 
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48 


9 


1 


9 


Totals 


28 


163 



ines each candidate solution from the initial 
OMAP to see whether all the other molecules 
are capable of presenting the same pattern. By 
changing the focus to the hypothesis of a com- 
mon three-dimensional pattern, a more effi- 
cient approach has been devised (Fig. 3.46) 
(3 99). Clearly, the algorithms that one chooses 
to do the problem are important. 

4.4.4 Alternative Approaches. A conceptu- 
ally similar approach to receptor mapping has 
been taken by Ghose and Crippen (446-449), 
who used the distance geometry method to an- 
alyze site points and drug interactions. A site 
model was postulated with some initial esti- 
mates of force constants between the appro- 
priate portion of the ligand and the site point. 
The binding energy for a particular binding 
mode can be calculated: 

■^calcd ~ 

where is the conformational energy, c is a 
coefficient to be fit, x is the interaction of a site 
point i with the bound ligand point m, which 
depends on their types. The novel aspect of 
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this approach was the use of distance geome- 
try to generate avariety of conformers binding 
within the postulated site and then finding a 
set cf force constants between the postulated 
site points and ligand points that will predict 
the affinities of the compounds in the data set 
when bound in their optimal manner. With a 
site model of 11 attractive site points and 5 
repulsive ones for DHFR, Ghose and Crippen 
(447) were able to derive force constants that 
fit 62 molecules, with an = 0.90, and pre- 
dict the activity of 33 molecules, with an = 

0.71. The compounds, however, are essentially 
an extended congeneric series because the 
core recognition portion of the inhibitor, the 
pyrimidine ring, is common to all the 
compounds. 

Linschoten et al. (450) extended Crippen's 
method by use of lipophilicity to describe the 
binding of parts of the ligand to lipophilic ar- 
eas cf the receptor. Through the use of only a 
nine-point model of the turkey erythrocyte 
)3-receptor and six energy parameters, they 
successfully modeled 58 compounds. Distance 
geometry approaches to receptor-site model- 
ing have been reviewed (449,45 1). 

Simon and his coworkers have developed 
(426) a quantitative 3D-QSAR approach, the 
minimal steric (topologic) difference (MTD) 
approach. Oprea et al. (452) compared MTD 
and CoMFA on affinity of steroids for their 
binding proteins and found similar results. 
Snyder and colleagues (453) developed an au- 
tomated method for pharmacophore extrac- 
tion that can provide a clear-cut distinction 
between agonist and antagonist pharmaco- 
phores. Klopman (404, 454) developed a pro- 
cedure for the automatic detection of common 
molecular structural features present in a 
training set of compounds. This has been used 
to produce candidate pharmacophores for a 
set cf antiulcer compounds (404). Extensions 
(454)of this approach allow differentiation be- 
tween substructures responsible for activity 
and those that modulate the activity. 

Bersuker and Dimoglo (455) described a 
matrix-based approach that combines geomet- 
ric and electronic features of a molecule, the 
electron-topological approach. For each mole- 
cule, an electron-topological matrix of congru- 
ity (ETMC) is constructed based on a con- 
former selected by conformational analysis. 



The ETMC is essentially an interatomic dis- 
tance matrix (Fig. 3.47), with the diagonal ele- 
ments containing an electronic structural pa- 
rameter (atomic charge, polarizability, HOMO 
energy, etc.). Off-diagonal elements for two at- 
oms that are chemically bonded are used to 
store information regarding the bond (bond 
order, polarizability, etc.). Matrices for active 
compounds in a series are then searched for 
common features that are not shared by inac- 
tive compounds. The successful examples 
cited are predominately for small, relatively 
rigid structures where the conformational pa- 
rameter does not confuse the analysis. 

Martin et al. (456) developed a strategy for 
determining both the bioactive conformation 
and a superposition rule for each active mole- 
cule in a data set. In DISCO, a set of low en- 
ergy conformers for each molecule is pro- 
cessed to locate atoms within the molecule and 
extensions for binding-site points for superpo- 
sition. A clique-finding algorithm then finds 
superpositions containing at least one confor- 
mation of each molecule and a user- specified 
minimum number of site points. 

Unlike methods that are limited to a pre- 
computed set of rigid conformers, GASP (Ge- 
netic Algorithm Similarity Program) (457) al- 
lows full conformational flexibility of ligands. 
GASP employs a genetic algorithm for deter- 
mining the correspondence between func- 
tional groups in different molecules and the 
alignment of these groups in a common geom- 
etry for receptor binding. For a set of ligands, 
GASP automatically identifies rotatable 
bonds and pharmacophore elements such as 
rings and potential hydrogen-bonding sites. A 
population of chromosomes is randomly con- 
structed, where each chromosome represents 
a possible alignment of all the molecules. 
Chromosomes encode the torsion settings for 
rotatable bonds as well as the intermolecular 
mapping of elements. The fitness score of a 
particular alignment is the weighted sum of 
three terms: the number and similarity of 
overlaid elements, the common volume of all 
the molecules, and the internal van der Waeils 
energy of each molecule. Using a mutation or 
crossover operator, child chromosomes are 
produced. Those with improved fitness scores 
replace the least-fit members of the existing 
population. The calculation terminates when 
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Figure 3.47. The electron- topological matrix of congmity (ETMC) for a 17- atom fragment proposed 
by Bersuker and Dimoglo (455) to encode geometrical and electronic features of molecules. 



the fitness of the population fails to improve 
by a specified amount, or when the preset 
number of genetic operations is completed. 
GASP produces several sets of alignments and 
their associated pharmacophore elements. 

4.4.5 Receptor Mapping. One can attempt 
to decipher physical properties of the receptor 
by use of data from both active and inactive 
analogs. Interpretation of results requires 
some understanding of the interactions be- 
tween ligand and receptor that underlie mo- 
lecular recognition. Oprea and Kurunczi (458) 
reviewed these interactions in the context of 
receptor mapping. A basic assumption is that 
a compound that contains the correct pharma- 
cophoric elements and has the capability of 
positioning them correctly should be active. 
Compounds with these attributes that are in- 
active must be incapable of binding to the re- 
ceptor in the correct orientation; that is, steric 
overlap with the receptor must occur. By cal- 
culating the combined volume of the active an- 
alogs superimposed in the correct orientation, 
one has mapped space that cannot be occupied 



by the receptor and that must be available for 
binding. Inactive compounds mentioned 
above should possess novel volume require- 
ments, some portion of which is likely to over- 
lap with that occupied by the receptor. As an 
example of receptor mapping, Sufrin et al. 
(402) showed with amino acid analogs of me- 
thionine, which inhibited the enzyme, methi- 
onineiadenosyl transferase, that the data for a 
set of rigid amino acid inhibitors required the 
postulation of competition between the inac- 
tive analogs and the enzyme for a particular 
volume of space (Fig. 3.48). Summation of the 
volume requirements for the set of com- 
pounds, when oriented on the amino acid 
framework, yielded a minimum space from 
which the receptor could be excluded. Each 
amino acid had the necessary binding ele- 
ments, but several were inactive. Each of the 
inactive analogs required extra volume not re- 
quired by the active analogs and shared a 
small common unique volume whose occu- 
pancy by the enzyme would be sufficient to 
rationalize their inactivity. 

Klunk et al. (459) used separate receptor 
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Figure 3.48. Example of recep- 
tor mapping of set of enzyme in- 
hibitors that can be aligned on 
common amino acid framework. 
Set of inactive compounds all re- 
quire common novel volume when 
compared with active compounds 
(402). Used with permission. 




mapping of two different chemical classes of 
ligands to support the hypothesis that they 
bound to the same site. Calder et al. (460) ar- 
gued that a successful correlative CoMFA 
model for 36 compounds of six chemical 
classes of GABA inhibitors indicated that the 
alignments used were significant. In some 
cases, comparison of volume maps for two re- 
ceptors have allowed optimization of activity 
at one receptor with respect to the other. The 
work of Hibert et al. (461, 462), through the 
use cf receptor mapping to increase the selec- 
tivity cf a lead compound for the 5-HT^^ re- 
ceptor over the a,-adrenoreceptor, has re- 
sulted in clinical trials for a novel chemical 
class. This steric-mapping approach has be- 
ccme relatively popular, and numerous exam- 
ples appear in current journals (463) on a reg- 
ular basis. 

Although there are several feasible algo- 
rithms to deal with unions of molecular vol- 
umes, the use of pseudoelectron density func- 
tions calibrated to reproduce VDW radii (424) 
with three-dimensional contouring to repre- 
sent the surface has allowed mathematical 
manipulation of the density associated with 
each lattice point to allow for union, intersec- 



tion, and subtraction of volumes. Analytical 
representation of molecular volumes by Con- 
nolly (464, 465) and solvent-accessible sur- 
faces by Kundrot et al. (466) may be an alter- 
native that would allow optimization of 
volume overlap, for example, by minimizing 
the difference in volume between two struc- 
tures. The solvent-accessible surface area can 
be used to approximate the free energy of hy- 
dration and a rapid, numerical procedure for 
its calculation has been reported (467). 

4.4.6 Model Receptor Sites. One of the first 
visualizations of a receptor model is that of 
Beckett and Casey (468) for the opiate recep- 
tor published in 1954. Because morphine and 
many other compounds active at this receptor 
are essentially rigid, the model did not have to 
address the interaction of myriad numbers of 
flexible, naturally occurring opioid ligands, 
such as endorphins and enkephalin, which 
were only subsequently discovered. The model 
receptor had an anionic site to bind the 
charged nitrogen, a hydrophobic flat surface 
with a cleft to bind the phenyl ring, and a hy- 
drophobic hydrocarbon bridge seen in mor- 
phine. Kier (469) published a number of pa- 
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Figure 3 .49. Peptidic pseudo- 
receptor used to calculate af- 
finity cf NMDA agonists and 
antagonists (453). Used with 
permission. 





pers attempting to define the pharmacophore 
based on semiempirical molecular orbital cal- 
culations of in vacuo minimum-energy confor- 
mations. Although his basic concepts were 
valid, his emphasis on the global minima in 
vacuo limited his scope of applicability. 

Humber et al. (470)used semirigid antipsy- 
chotic drugs, the so-called neuroleptics, which 
antagonize CNS dopamine transmission and 
displace dopamine from its receptor, to formu- 
late a geometrical arrangement of receptor 
groups to rationalize their activity. Olson et al. 
(471) used this model to design a novel ste- 
reospecific dopamine antagonist and success- 
fully predicted its stereochemistry. 

Because we are reasonably convinced the 
receptor is a protein, construction of hypothet- 
ical sites from amino acid fragments and cal- 
culation of affinity for these sites should cor- 
relate with observed affinity, assuming that 
the type of interactions and their geometry is 
represented by the site in some reasonable 
manner. An individual fragment such as an 
indole ring from tryptophan does a good job of 
simulating a flat hydrophobic surface. Holtje 
and Tintelnot (472) constructed a site for 
chloramphenicol from arginine and histidine 



by varying the distances of the amino acid 
from its postulated binding position and find- 
ing the optimal distance for correlation with 
observed affinity for the ribosome. Peptidic 
pseudoreceptors have been constructed ( 453 ) 
that correctly rank-order glutamate NMDA 
agonists and antagonists (Fig. 3.49). 

An intermediate between unknown recep- 
tors and ones where the three-dimensional 
structure is known are models based on homol- 
ogy. For the medicinal chemist, the G-protein 
receptors have been of intense interest and nu- 
merous models (339, 340,461, 473)of the vari- 
ous receptor types have been developedbased on 
their presumed three-dimensional homology 
with bacteriorhodopsin ( 474 ). Mechanisms of 
signal transduction (475) and differences be- 
tween agonists and antagonists ( 476 ) have even 
been rationalized based on such models. Nord- 
vall and Hacksell (341) recently combined the 
construction of such a model for the muscarinic 
ml receptor with constraints derived from steric 
mapping of muscarinic agonists. By adding the 
experimental constraints from ligand binding, a 
qualitative model was derived that was able to 
reproduce experimentally derived stereoselec- 
tivities. 
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4.4.7 Assessment of Model Predictability. 

Because it is unlikely that there will be suffi- 
cient structure-activity data to uniquely de- 
fine a model at atomic resolution in competi- 
tion with crystallography, justification for 
model building must come from its potential 
predictive power and possible insight into the 
receptor-drug interaction before detailed 
three-dimensional information from either 
crystal structure or NMR studies. Certainly, 
the questions regarding the ability of a pro- 
posed drug to bind to the active site without 
steric conflict with the receptor can be ad- 
dressed by the methods outlined above in a 
qualitative manner. The resolution of our re- 
ceptor models is too crude, however, to subject 
them to molecular mechanics estimates of af- 
finities. There are alternative paradigms, 
however, based on pattern recognition tech- 
niques in which a set of analogs and their 
activities are used, along with their physico- 
chemical parameters, to generate a mathe- 
matical model that relates the values of the 
physicochemical parameters for a given ana- 
log with its activity. One such paradigm is 
comparative molecular field analysis (CoMFA), 
which combines the three-dimensional elec- 
trostatic and steric fields surrounding the an- 
alogs with powerful statistical techniques, 
partial least squares (PLS) (477) and cross- 
validation, to generate predictive models if a 
set cf orientation rules are available for align- 
ing the molecules for comparison and predic- 
tion. Alternative methods for assessing simi- 
larity and their use in QSAR schemes have 
been compared (215) with CoMFA. Another 
approach is the use of neural nets that learn to 
"see" patterns in much the same way as our 
own nervous system processes information. 
Examples of the use of this pattern-recogni- 
tion approach include classification of mecha- 
nism of action for cancer chemotherapy (478) 
and QSAR studies of DHFR inhibitors (479, 
480) and carboquinones (481). Machine learn- 
ing has also been applied (482) to the QSAR 
problem. Trimethoprim analogs were success- 
fully analyzed for their inhibition of DHFR 
and similar results to the original Hansch re- 
sults were obtained. It is not clear that this 
paradigm could be applied to noncongeneric 
series, at least as outlined. 



What appears crucial to such studies is the 
choice of training set, which encompasses as 
much of parameter space as one is likely to use 
in the predictive mode as well as tests of the 
predictive ability of resulting models. Given 
that one is dealing with a situation in which 
the number of variables is larger (often several 
times) than the number of observations, lin- 
ear regression models are not applicable be- 
cause chance correlations are highly probable. 
The use of cross-validation allows selection of 
correlations that are predictive in a self-con- 
sistent manner within the training set. This 
does not mean to imply that such internally 
self-consistent models have predictive power 
outside of the training set, or extremely close 
congeners. 

DePriest et al. (483, 484) applied the 
CoMFA methodology to a series of 68 ACE 
(angiotensin-converting enzyme) inhibitors 
representing 28 different chemical classes. 
Through use of the binding-site geometry de- 
termined by Mayer et al. (397), a CoMFA 
model with a statistically significant cross- val- 
idated and considerable predictive ability 
for inhibitors outside of the training set was 
derived. Because the geometry of the ACE in- 
hibitors was determined computationally by 
an active-site analysis rather than experimen- 
tally, a comparison of the results of the ACE 
series against thermolysin inhibitors, for' 
which there were crystallographic data to ex- 
plicitly define the binding-site geometry and 
the resulting alignment rules, was made, 
given that thermolysin is also a zinc-contain- 
ing metallopeptidase with numerous similari- 
ties between ACE and thermolysin. Their re- 
sults give strong support to both the Active 
Analog Approach (398) used to define the 
alignment rule for the ACE series and the 
CoMFA methodology itself. In the absence of 
an experimentally known active-site geome- 
try, correlations were derived that explain as 
much as 84% of the variance in activities 
among a set of 68 diverse ACE inhibitors by 
use of CoMFA steric and electrostatic poten- 
tials plus a zinc indicator variable (Fig. 3.50). 
If the set of 68 ACE inhibitors was divided into 
three classes and correlations are derived for 
each class, CoMFA parameters alone explain 
79-99% of the variance in activities. It was 
notable that statistically significant correla- 
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Figure 3.50. Plot of experimental versus 
predicted inhibition constants for 68 ACE 
inhibitors used in derivation of CoMFA 
model for the ACE active site (484). This plot 
shows the self-consistency of the model. 
Used with permission. 




Actual (plC50) 



tions were found, in spite of the fact that 
CoMFA does not explicitly consider hydropho- 
bicity or solvation. In further support of the 
active-site paradigm, the cross-validated re- 
sults of the ACE series were equivalent to 
those of the thermolysin series (cross- vali- 
dated = 0.65 to 0.70), for which the align- 
ment rule was defined by crystallographic 
data. 

The predictions for molecules outside the 
training sets are a valid test of the predictive 
ability of the model, rather than just a confir- 
mation of self-consistency of the derived 
model. In other words, statistical analysis 
alone does not answer the question of a chance 
correlation (485) for the training set. One 
must investigate lateral correlations such as 
predictability. The predictive correlations pre- 
sented by DePriest et al. (483;484) represent a 
total of 66 diverse inhibitors that were not 
chosen as analogs of compounds present in the 
training set, but by selecting published papers 
on three different chemical classes and testing 
all compounds in the papers [predictive = 
0.46 for the set of 66 compounds predicted, 
which had not been included in the training 
set for the ACE model with a zinc indicator of 
10 (Fig. 3.51)]. The "predictive" was based 
only on molecules not included in the training 
set and was defined as 



predictive = (SD - “press”)/SD 

where SD is the sum of the squared deviations 
between the affinities of molecules in the 
test set and the mean affinity of the training 
set molecules, and "press" is the sum of the 
squared deviations between predicted and ac- 
tual affinity values for every molecule in the 
test set. It should be obvious from the equa- 
tion that prediction of the mean value of the 
training set for each member of the test set 
would yield a predictive = 035 out of the 66 

predicted molecules had residuals less than 
one log value with a predictive value for the 
collective set of these 35 test molecules of 0.90. 
Of the 31 inhibitors with residuals greater 
than 1.0, 8 were carboxylates, 12 were phos- 
phates, and 11 were thiols. Clearly, no single 
class of inhibitors dominated the distribution 
of residuals. Considering both the composition 
and the method of selection of the test data 
sets (range of activities over 7 log units), the 
fact that more than 50% of the molecules were 
predicted with correlations greater than = 
0.90 lends strong support to the use of CoMFA 
as a tool for QSAR development. 

Use of CoMFA as a predictive tool for recep- 
tors of known three-dimensional structure 
has also been explored. Klebe and Abraham 
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Figure 331. Plot of experimental versus predicted inhibition constants for 35 ACE inhibitors not 
used in derivation of CoMFA model (484). This plot indicates the predictability of the model. Used 
with permission. 



(486) used two enzymes (thermolysin and re- 
nin) as weU as antiviral activity against 
human rhino virus, where the coat-protein re- 
ceptor is known, to calibrate CoMFA method- 
ology. They concluded that only enthalpies of 
binding and not binding affinities were pre- 
dicted by CoMFA. Waller et al. (264) developed 
a predictive CoMFA model for the binding af- 
finities cf HIV-protease inhibitors based on 
crystal structures of complexes. Initial analy- 
sis cf the 59 molecules in the training set 
representing five structurally diverse classes 
(hydroj^ethylamine, statine, norstatine, keto- 
amide, and dihydroxyethylene) of transition- 
state protease inhibitors yielded a correlation 
with a cross-validated value of 0.786. To 
evaluate the predictive ability of this model, a 
test set of 18 additional inhibitors (487) was 
used that represented another class of transi- 
tion-state isostere, hydroxyethylurea. The 
modd expressed good predictive ability for the 
test set of hydroxyethylurea compounds 
(^pred = 0.624) with all compounds predicted 
within 1.06 log unit (1.4 kcal/mol in binding af- 
finity) cf their actual activities, with an average 
absolute error of 0.58 log units (0.8 kcal/mol) 
across a range of 3.03 log units (Fig. 3.52). Pre- 



dictions from this CoMFA model of HTV pro- 
tease are being used to prioritize synthesis of de 
novo-designed HIV-protease inhibitors not in- 
cluded in development of the model. 

Crippen developed a method (488) to objec- 
tively model the binding of small ligands to . 
receptors, given the experimentally deter- 
mined affinities of a set of ligands. The proce- 
dure, Vorom, used Voronoi polyhedrato gen- 
erate the simplest geometrical model of the 
binding site. In a recent application to DHFR 
inhibitors (489), only eight analogs were used 
in the training set to derive the model and the 
affinities of 23/39 of the test set molecules 
were correctly predicted, with an average rel- 
ative error of 0.83 kcal/mol for the remaining 
compounds. 

5 CONCLUSIONS 

Rapid advances in molecular and structural 
biology have provided ample therapeutic tar- 
gets characterized in three dimensions. Tools 
to exploit this information are being rapidly 
developed and several strategies for de novo 
design of ligands, given an active site, are un- 
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Figure 3.52. Plot of experimental 
versus predicted inhibition constants 
for 18 HTV-l protease inhibitors not 
used in derivation of CoMFA model 
(264). This plot indicates the predict- 
ability of the model. 




Actuai 



der investigation. It is already clear, however, 
that iterative approaches are necessary be- 
cause of the lack of precision in predicting af- 
finities for bound ligands. Molecular mechan- 
ics and computer graphics are essential 
components for design of novel ligands, and 
rapid progress in evolving a useful set of tools 
is apparent. 

The ultimate goal in comparison of mole- 
cules with respect to their biological activity is 
insight into the receptor and its requirements 
for recognition and activation. Conjecture re- 
garding the receptor is often a necessary part 
of rationalizing a set of structure-activity 
data. Although the problem of characterizing 
the active site of an unknown macromolecule 
indirectly is certainly challenging, the analy- 
sis of structure-activity data of a set of ligands, 
especially if their structural variety is wide, 
allows useful models of active sites to be devel- 
oped. There are numerous caveats that must 
be acknowledged, however, such as flexibility 
of the receptor, multiple binding modes for li- 
gands, and lack of uniqueness of most models 
because of limited experimental observations. 
Success in using these methods would appear 
to be increasing. This reflects both technolog- 
ical advances as weU as insight into the prob- 
lem and algorithmic improvements in our an- 
alytical approaches. 



The game of 20 questions with receptors 
has progressed with experience. Ambiguity in 
interpretation of results and multiple models 
clearly reflect the uncertainties inherent in 
this indirect approach. Nevertheless, the ab- 
sence of direct experimental data in many bi- 
ological systems of intense therapeutic inter- 
est make this the only game available for 
many. It is hoped that the next decade wih see 
further progress in our ability to extract three- 
dimensional information from structure-ac- 
tivity studies on unknown receptors. 

This perspective has examined the ap- 
proaches to molecular modeling and drug de- 
sign and emphasized their limitations. The 
reader should be aware, however, that these 
tools are daily used on many problems of ther- 
apeutic interest with increasing success. This 
is clearly witnessed by publications of such 
studies in almost every issue of current major 
journals. For specific application areas, such 
as RNA (490, 491 ), DNA ( 492 - 496 ), mem- 
brane ( 497 - 507 ), or peptidomimetic modeling 
(382, 508 - 513 ), the reader is referred to the 
literature. The prediction of molecular prop- 
erties, such as log P and correlation between 
substructures and metabolism, has led to a 
dramatic increase in efforts to correlate ad- 
sorption, distribution ( 514 ), metabolism ( 515 - 
517 ), and elimination (ADME) with chemical 
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Structure (518-522). In addition, the advent 
of combinatorial chemistry has focused mod- 
eling efforts on prioritizing compounds (523- 
528) for high throughput screening based on 
chemical diversity (529-531), druglike prop- 
erties (532, 533), predicted oral bioavailability 
(534, 535), and so forth. 
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1 INTRODUCTION 

This chapter describes the forces that hold to- 
gether complexes between large and small 
molecules, particularly where the large mole- 
cule is a protein or nucleic acid and the small 
molecule is an inhibitor or substrate. Forces 
between atoms are conventionally divided into 
the two categories of covalent and noncovalent 
"bonds." A covalent bond is an attractive in- 
teraction between two atoms in which each 
contributes a valence electron. For example, 
such a bond is formed between two hydrogen 
atoms to make the H, molecule: H + H 
H-H. It also includes what most chemists 
might consider "ionic" bonds such as Na + Cl 
^ Na-Cl, even though the valence electron 
pair in this case is much closer to the chlorine 
atom than to the sodium atom. The conven- 
tional study of chemical reactions is devoted to 
describing the strengths of covalent bonds and 
to understanding the ways in which they are 
formed and broken (1). 

Drug-receptor interactions, on the other 
hand, are generally influenced most by 
weaker, noncovalent "bonds," where electron 
pairs are "conserved" in reactants and prod- 
ucts. Examples of such interactions are " da- 
tive bonds, "e,g.,HgN; + BHg-^HgNiBHgand 
hydrogen bonds, e.g., H 2 O + HgO 
HgO- • HOH. It is these noncovalent bonds 
that provide the "force" to make drugs inter- 
act strongly with their targets. 

Some sample potential energy curves for 
covalent and noncovalent interactions be- 
tween two atoms are given in Fig. 4.1. The left 
side shows an interaction curve for the two 
oxygen atoms in the O, molecule. This has a 
large dissociation energy (about 117 kcal/ 
mol in this case), so that at room temperature 
where RT approximates 0.6 kcal/mol {R is the 
universal gas constant and T is the absolute 
temperature), the fraction of "broken" bonds 
at equilibrium q-^oIRt very small. By con- 
trast, noncovalent bonds are much weaker, 
typically 1-10 kcal/mol, and thus much easier 
to break. The right side of Fig. 4.1 shows in- 
teraction curves for the two sodium atoms in 



the Na, dimer; this interaction is somewhat 
stronger than a typical hydrogen bond but has 
about the same shape. Also shown is the 
purely nonbonded interaction between two 
oxygen atoms in different water molecules. 
Here the value is so small (about 0. 15 kcal/ 
mol) that it really cannot be seen on the scale 
of this figure. Hence, a significant fraction of 
nonbonded interactions can be broken at 
equilibrium at room temperature. It is this 
weakness of noncovalent bonds that makes 
them so useful in biological processes, because 
a small change in the chemical environment 
(such as temperature, concentrations, or ionic 
strength) can form or break such a bond. Prob- 
ably the best known important noncovalent 
bonds are those between the strands of DNA, 
where hydrogen bonds hold the double helix 
together. When the cells begin to replicate, 
chemical signals (e.g., proteins binding to the 
DNA) shift the equilibrium to the single- 
stranded DNA, breaking these hydrogen 
bonds. Other important examples of noncova- 
lent complexes include those between enzyme 
and substrate, "receptor" protein and hor- 
mone, antibody and antigen, and intercalator 
and DNA. 

Much of our concern in this chapter is with 
the interaction: 

kf 

drug + receptor ^ complex 

The rate constant for association of the 
complex is kf\ the rate constant for dissocia- 
tion of the complex is k,; and the affinity, or 
association constant =kf/k^. It is usually 
assumed that the biological activity of a drug 
is related to its affinity for the receptor, 
although there are processes such as actino- 
mycin D-DNA interactions in which the rate 
of dissociation kj. is more relevant to the bio- 
logical activity (2, 3). 

The thermodynamic parameters of interest 
for the reactions above are the standard free 
energy (AG®), enthalpy (AH"), and entropy 
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Atcm-alcin distance, ang. 



Figure 4.1. Potential energy curves for atom-atom interactions in 0„ NEI 2 , and the 0—0 interac- 
tion in a water dimer. Note the different energy scales on the left and right. 



(AS") cf association. These are related by the 2 ENERGY COMPONENTS FOR 
equations INTERMOLECULAR NONCOVALENT 

INTERACTIONS 



AG“ = -RTln 
AG** = AH" - TAS** 



This measurement of allows one to cal- 
culate AG**, the free energy of association of 

the complex. To find AH° and AS° separately 
requires a determination of as a function 

of temperature (if AH" and AS" are relatively 





temperature independent, a plot of In vs. 

1/r can yield AH° and AS”) or a calorimetric 

measurement of AH" directly. Because AH’ 

and AS" themselves are often quite tempera- 
ture dependent, the latter experiment is more 
definitive. 

This chapter provides some background 

about the forces that hold molecules together, 
with emphasis on the noncovalent interac- 
tions of interest in biology, and attempts to 
relate determinations of the 

thermodynamics of association to the forces 

I involved in the association. The discussion in 
the re mai nder of the chapter is divided into 
two parts. First, we discuss the forces that 
hold molecules together in the gas phase and 
I solution and describe how these forces can be 
i mathematically modeled by fairly simple func- 
tions; second, we discuss biological examples 
of noncovalent interactions and analyze the 
landing forces in particular cases. 



Quantum mechanical calculations on small 
molecule association suggest that there are 
five major contributions to the energy of inter- 
molecular interactions in the gas phase (3, 4). 
The sum of these is the dissociation energy of 
the intramolecular complex represented in 
Fig. 4.1. Table 4.1 contains some examples of 
magnitudes of the different energy compo- , 

nents for different interactions. This section 
provides a qualitative introduction to these 
forces. Section gives and overview of mathe- 
matical models suitable for computer calcula- 
tions. 

2.1 Electrostatic Energy 

Given information on the charge distribution 

of two molecules A and B, we can evaluate the 
electrostatic interaction energy between 
them. Although nuclei can be treated as point 
positive charges, the negative charge of elec- 
trons is smeared out over space. Thus, a rigor- 
ous evaluation of the electrostatic ener^ in" 
volves an integration over the electron (^uds 

of the two molecules. In most practical calcu- 
lations, however, the electrons as weU as the 
nuclei are represented by point charges, 
whose position and magnitude are usually 
chosen to reproduce known molecular proper- 
ties. The strength and the directionality of 
A. . .B electrostatic interactions are usually 




172 



Drug-Tai^ Bindii^ F(»x3es: Advances in Foioe FieU Aj^roadies 



Table 4.1 Some Examples of Interaction Energies of Noncovalent Complexes (kcal/mol) 



Interaction 






Interaction Energies 








AE,, 


AE^i, 


AE 


AEpoi 


AEe, 


He... He 


0 . 02 “ 


0 


- 0.028 


+ 0.008 


0 


0 


Xe . . . Xe 


0.64“ 


0 


- 0.86 


0.40 


0 


0 


CeHe . . . CeH 0 




¥=0 


#0 


^0 


^0 


^0 


H2O . . . H2O 


00 

n 


-9.2 


(- 1 ) 


4.0 


- 0.5 


- 2.2 


TCNE ... OH, 




-3.9 


(- 1 ) 


2.0 


- 0.2 


- 1.0 


Li^ . . . OH2 


48 .r 


-51.1 


(- 1 ) 


12.7 


- 7.8 


-1.7 


F“ , ..OH, 


4i.r 


- 37.8 


(-1) 


20.5 


- 4.9 


-17.9 


NH^ . . . F- 


164.7 


- 181.4 


(- 1 ) 


61.6 


- 8.3 


- 35.6 



-AE, calculated (or experimental) total interaction energy equal to D^, in Fig. 1, kcal/mol; AE,„ electrostatic energy; 
AEdis, dispersion energy; AE„ exchange repulsion energy ; AEp^i, polarizationenergy ; AE,„ charge transfer energy (valuesin 
parentheses are estimated; TCNE, tetracyanoethylene). 

"See Karplus and Porter (12). 

*See Jandaet al. (13). 

'See Umeyama and Morokuma (7); this value for AE is certainly too large; see better values in Table 3. 

“^See Morokuma et al. (13), 

'See KoUman (14). 



dominated by the fir st non vanishing multi- 
pole moment of the charge distribution, 

no. charges 

= 2 
i = l 

where are the individual charges and is 
the vector from the origin of the coordinate 
system to the ith charge (5, 6). Molecules that 
are charged have a nonzero zeroth moment 
M,. Ionic crystals such as Na'^Cl“ are held 
together predominantly by electrostatic at- 
traction between oppositely charged ions. 
Crystals of ice I are mainly held together by 
dipolar electrostatic forces where Mq = 0 and 
Ml ^ 0, because there are virtually no ions in 
these crystals. It should be noted here that 
"hydrogen bonding" is not a separate energy 
component; typically hydrogen bonds contain 
important energy contributions from all five 
energy components, although the electrostatic 
component is usually the largest contributor 
to this interaction (7). 

Of the intermolecular energy components, 
the electrostatic is the longest range (i.e., it 
dies off most slowly with distance as the two 
molecules separate). Ion-ion interactions die 
off as 1/R; ion-dipole as 1/R^; dipole-dipole as 
1/R^, etc. In general, if two molecules have as 
their first nonvanishing multipole moments 
Mn and the electrostatic interaction en- 



ergy between them dies off as The 

electrostatic interaction energy between wa- 
ter a dipolar molecule (n = 1) and benzene, 
whose first nonvanishing moment is a quadru- 
pole (m = 2), dies off as 1/R^. 

2.2 Exchange Repulsion Energy 

The Pauli principle keeps electrons with the 
same spin spatially apart. This principle ap- 
plies whether one is dealing with electrons on 
the same molecule or on different molecules 
and is the predominant repulsive force (6)that 
keeps electrons of different molecules from in- 
terpenetrating when noncovalent complexes 
are formed. This repulsive term is often repre- 
sented by an analytical function of the form 

A 

^ n = 9 or 12 

where R is the distance between molecules or 
nonbonded atoms and A is a constant that de- 
pends on the atom types. However, the best 
available quantum mechanical calculations 
suggest that this repulsion should diminish 
with an exponential dependence on the dis- 
tance between the atoms (6). This differenceis 
only important for very precise calculations: 
the key point is that the repulsive energy rises 
very quickly once the electrons from two dif- 
ferent atoms overlap significantly. Roughly 
speaking, this happens with the distance be- 
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Table 4.2 Selected Atomic van der Waals 
Radii (inA) 



Element 


^ VDW 


Hydrogen 


1.20 


Carbon 


1.70 


Nitrogen 


1.55 


Oxygen 


1.50 


Fluorine 


1.50 


Phosphoms 


1.85 


Sulfur 


1.80 


Chlorine 


1,70 


Bromine 


1.80 



Values from A. Bondi,/. Phys. Chem. 68,441 (1964) . 



tween two atoms is less than the sum of their 
van der Waals radii. Table 4.2 gives some typ- 
ical radii for atoms commonly found in organic 
molecules. 

23 Polarization Energy 

Whai two molecules approach each other, 
there is charge redistribution within each mol- 
ecule, leading to an additional attraction be- 
tween the molecules. The energy associated 
with this charge redistribution is invariably 
attractive and is called the polarization en- 
agy. For example, if a molecule with polariz- 
ability a is placed in an electric field, E, the 
polarization energy is 

^Po= “2 



ff the electric field is caused by an ion, then 
E = qi/R^, where q is the ionic change, i is the 
unit vector along the ion-molecule direction, 
and R the ion-molecule distance, which is the 

for this ion-induced dipole 
interaction. The correspondingformulaforc?i- 
pole-induced dipole interaction between two 
dipolar molecules is 



E 



P 0 ■ 



1 ailll + QC2/A? 

I 



[where the ix’s are the dipole moments of the 
; molecules, the a's are their polarizabilities, 
andR is the distance between molecules. The 
polarizabihty of a molecule can be broken 
down into atomic contributions [atomic polar- 



izabilities are additive to a good approxima- 
tion (8)], and it is roughly proportional to the 
number of valence electrons, as well as on how 
tightly these valence electrons are bound to 
the nuclei. Umeyama and Morokuma (9) have 
calculated the ion-induced dipole contribution 
to the proton affinities of the simple alkyl 
amines. They attributed the order ofgasphase 
proton affinities in the alkyl amines [NH3 < 
CH3NH2 < (CH3)2NH < (CH3)3N] to the 
greater polarizability of a methyl group than a 
hydrogen. A simple estimate using the above 
empirical equation for an ion-induced dipole 
interaction with q = -1- 1 , which is the differ- 
ence in polarizabilities of a methyl and a hy- 
drogen (Aa) «=* 4 cm^, a proton-methyl dis- 
tance of 2.0 A, and a proton-proton distance of 
1.6 A, leads to an expected increase of "^20 
kcal/mol of proton affinity for every methyl 
group added to NHg. This very qualitative es- 
timate is of the right magnitude but about two 
to three times too large (see below). 

2.4 Charge Transfer Energy 

When two molecules interact, there is often a 
small amount of electron flow from one to the 
other. For example, in the equilibrium geom- 
etry of the linear water dimer HO — . .OH2, 
the water molecule that is the proton acceptor 
has transferred about 0.05e“ to the proton do- 
nor water (9, 10). The attractive energy asso- 
ciated with this charge transfer is the charge 
transfer energy and can be thought of as a 
mixing of an ionic resonance structure 

H — . .H OHa^’^Hnto the overall wave 

function. Although the charge transfer energy 
is an important contributor to the interaction 
energy of most noncovalent complexes, the 
presence of a "charge transfer" electronic 
transition in the visible spectrum does not 
mean that the charge transfer energy is the 
predominant force holding the complex to- 
gether in its ground state. For example, the 
complex between benzene and I,, earlier 
thought to be a prototype "charge transfer" 
complex, seems to be held together predomi- 
nantly by electrostatic, polarization, and dis- 
persion energies in its ground electronic state 
( 11 ). 
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2.5 Dispersion Attraction 

There are attractive forces existing between 
all pairs of atoms, even between rare gas at- 
oms (He, Ar, Ne, Kr, Xe), which cause them to 
condense at a sufficiently low temperature. 
None of the other attractive forces (electro- 
static, polarization, charge transfer) can ex- 
plain the attraction between rare gas atoms; it 
is called the dispersion attraction (12). Even 
though the rare gas atoms have no permanent 
dipole moments, they are polarizable, and one 
has instantaneous dipole-dipole attractions 
in which the presence of a locally asymmetric 
charge distribution on one molecule induces 
an asymmetric charge distribution on the 
other molecule, e.g., '-He^-i- . . .'-He^-i-. 

The net attraction is called dispersion at- 
traction (often known as London or van der 
Wa£ils attraction) and is dependent on the po- 
larizability and the number of valence elec- 
trons of the interacting molecules. It dies off 
as where R is the atom- atom separation. 

The difference between this attraction and the 
polarization energy is that the latter involves 
the interaction of a molecule that is already 
polar with another polar or nonpolar mole- 
cule. 

2.6 Summary 

Having described the components of the inter- 
action energies, let us consider a number of 
specific examples in detail (Table 4.1). Unlike 
the total interaction energy, which can be 
measured experimentally, the individual en- 
ergy components cannot. The theoretical esti- 
mate of these quantities is often dependent on 
the method of calculation, but their qualita- 
tive features are usually independent of meth- 
odology. 

Rare gas-rare gas interactions (He. . .He 
and Xe. . .Xe) have only dispersion attraction. 
The difference between the potential weU 
depth of He. . .He andXe. . .Xe (Fig.4.1;T)o)^t 
the equilibrium distance is caused by the 
greater polarizability of the xenon atoms, and 
thus to the greater dispersion attraction be- 
tween them. A simple manifestation of this is 
the much higher boiling point of xenon than 
helium, caused by the greater attractive forces 
in xenon liquid. Although these energies are 
individually fairly small, they can add in a mo- 



lecular environment to significant energies; 
for example, the single largest attractive free 
energy contribution to binding in the stron- 
gest known small molecule-macromolecule 
interaction (biotin-avidin) is the dispersion at- 
traction (13). 

One might intuitively expect that benzene 
dimer would pack together like two flat plates, 
but this is not the case in the gas phase (14); 
the crystal structure also does not have paral- 
lel alignments of benzene molecules (15). Ben- 
zene, although having no dipole moment, does 
have a quadrupole moment (Mg 9^ 0). A simple 
way to think about this quadrupole moment is 
to realize that a benzene C — H is somewhat 
electropositive and its electron cloud is rather 
electronegative. A second benzene molecule 
would like to approach the first one so that its 
"electropositive" side approaches the other 
molecule's "electronegative side." Hence the 
main component of binding is expected to be 
electrostatic in nature. The water dimer 
(HgOfg and the ether. . .TCNE interactions 
are examples of prototypal H bonds and 
"charge transfer" complexes, but both are also 
held together mainly by electrostatic forces, 
although the other attractive energy compo- 
nents contribute significantly to the total i\E. 
The electrostatic component is predominant 
in determining all the structural parameters 
except the distance between molecules. Simi- 
larly, the geometry and net attraction between 
Li'^ and OH„ F“ and HgO, and NH 4 '^ and F- 
are dominated by the electrostatic energy 
component. 

3 MOLECULAR MECHANICS FORCE 
FIELDS 

We move now from qualitative considerations 
to a more quantitative approach. It has be- 
come clear that a simple molecular mechanical 
energy expression can represent noncovalent 
interactions surprisingly well (16). Such en- 
ergy expressions contain only the first three 
terms mentioned above: electrostatic, ex- 
change repulsion, and dispersion. By a suit- 
able choice of parameters, change transfer and 
polarization effects are implicitly included in 
such an expression, which is simple and easy 
to evaluate, along with its derivatives, for 
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molecules with thousands of atoms. Over the 
past quarter century, many interesting ap- 
plications of such molecular mechanical 
methods to complex molecules have been 
carried out (17). 

The ideas that are outlined in a qualitative 
above can also be cast into a useful math- 
ematical form for computer calculation. The 
basic idea is to write down a (fairly simple and 
approximate) function that gives the energy of 
the system as a function of the positions (or 
coordinates) of its atoms. Because the deriva- 
tive (or gradient) of this function yields the 
forces for Newton's equations, such a function 
is often called a "force field" ; and because mol- 
ecules are viewed as being made up of balls 
and springs (so that quantum effects are ig- 
nored), the term "molecular mechanics" is 
used to represent a concrete, mechanical pic- 
ture cf molecular motions and energies. 

3.1 Biochemical Force Fields 

Equation 4.1 represents about the simplest 
functional form of a force field that preserves the 
essential nature cf molecules in condensed phases. 



On the other hand, biochemists, guided by 
an interest in proteins and nucleic acids, have 
more generally followed a "bottom up" ap- 
proach (16, 19, 20). This approach focuses first 
on the atomic charges The most general 
method to derive the atomic charges is to fit 
them to quantum mechanically calculated 
electrostatic potentials on appropriately cho- 
sen molecules or fragments. In early attempt 
to do this, computational limitations in quan- 
tum mechanical calculations led to the use of a 
minimal basis set STO-3G to derive the g,; ( 1 6) . 
More recent efforts have used a 6-31G* or 
larger basis set (19). The 6-31G* basis set has 
the fortunate property in that it leads to 
charges (dipole moments) that are enhanced 
over accurate gas phase experimental values, 
and thus, implicitly builds in "polarization" 
effects characteristic of polar molecules in 
condensed phases. The fact that this basis set 
enhances the polarity just about the same 
amount as the popular water models TIP3P 
(21) and SPC ( 22 ), (where the charges are em- 
pirically adjusted to reproduce the water en- 
thalpy of vaporization) is a fortunate fact and 



UCR) ~ ^ 

bonds 

+ 2 K,(e - 

angles 

y 

+ X Y cos[n<#> - 7 ]) dihedral 

dihedrals 




bond 



atoms 

2 



Kj 



Rif 





atoms 

+ 2 

Kj 



QiQj 

eRii 



van der Weials 



electrostatic 



(4.1) 



The earliest force fields, which attempted 
describe the structure and strain of small 
organic molecules, focused considerable atten- 
tion on more elaborate functions of the first 
two terms, as well as cross terms (18), repre- 
senting a "top down" philosophy. 



is key in leading to balanced solvent- solvent 
and solvent- solute interactions. 

van der Waals parameters are generally 
dominated by the inner closed shell of elec- 
trons and thus are fortunately far more trans- 
ferable than atomic charges. Therefore, gener- 



176 



Drug-Target Binding Forces: Advances in Force Fieid Approaches 



ally only one set of van der Waals parameters 
(radius and well depth) per atom type need be 
employed, with the important exception of hy- 
drogen (23). Unfortunately, it is harder to de- 
rive van der Waals parameters than charges 
using ah initio quantum mechanics (6, 24). 
The alternative that has emerged as a general 
model is to empirically calibrate results to fit 
experimental liquid structures and enthalpies 
(25). 

Continuing with the "bottom up" develop- 
ment of a force field, we come to the torsion 
energy term, where the and y either come 
from experiment or quantum mechanical cal- 
culations on small molecule models. Whereas 
"top down" force fields often use many terms 
in the Fourier series for rotation around a 
given bond type and attempt to reproduce the 
conformational energy for a collection of mol- 
ecules, most "biochemical" force fields take a 
minimalist approach (16, 19, 20). For example, 
we would have only a single torsional term 
around an X-C-C-Y bond except when X or Y 
are electronegative, where another term can 
be rationalized from electronic effects and can 
be derived directly using quantum mechanical 
calculations. This helps our model to be more 
easily generalized to new molecules, albeit in 
some cases probably at the cost of some accu- 
racy. Exceptions to this minimalist approach 
are the t//, <f> of peptides and x of nucleic acids, 
where more terms were added to ensure as 
accurate as possible a reproduction of the con- 
formational energies around these key bonds. 

Finally, to ensure reasonable representa- 
tion of bond and angle terms, we use empirical 
data (structures and vibrational frequencies). 
The use of this simple harmonic model pre- 
cludes high accuracy, but in our opinion, one 
would compromise the simplicity and general- 
ity of the model with more complex functional 
forms. 

3.2 Force Field Models for Simple Liquids 

A key test of this approach is the ability to 
accurately reproduce liquid structures and en- 
ergies and free energies of solvation; these 
have traditionally been considered as key ele- 
ments in the development of successful force 
fields for liquids (25). The aqueous solvation 
free energies of a large number of molecules, 
including substituted benzenes, methanol, hy- 



drocarbons, N-methyl acetamide, and di- 
methyl sulfide, as well as the liquid structure 
and energy of methanol and N-methyl acet- 
amide, show good agreement with experi- 
ment, with little or no adjustment of parame- 
ters. For example. Fox and Kollman (25) have 
shown that this approach leads to a density 
and enthalpy of vaporization of liquid di- 
methyl sulfoxide (DMSO) within 2% of exper- 
iment, using restrained electrostatic potential 
charges (RESP) and van der Waeils parame- 
ters taken without modification from the cor- 
responding values in proteins. Similar results 
have been obtained for other organic liquids. 

3.3 Nonadditive and More Complex Models 

What are the most important weaknesses in 
the above-described parameterizational ap- 
proach and the use of Equation (4.1)? In our 
opinion, the main ones are the use of an effec- 
tive two-body potential and the use of only 
atom-centered charges. 

atom 

-^poi = - 2 2 polarization (4.2) 

i 

where is atomic polarizability. Substantial 
progress has been made in laying the founda- 
tion for the development of a complete force' 
field including explicit nonadditive effects 
(adding Equation 4.2 to Equation 4.1). Eirst, 
we have shown that such models, in contrast 
to additive models, lead to good agreement 
with experimental solvation free energies of 
representative organic ions CHgNHg"^ and 
CHgCOa” without any adjustment of van der 
Waals parameters (26). Second, we have 
shown that such nonadditive terms are essen- 
tial in accurately describing cation-7r interac- 
tions (27). Third, we have shown that one can 
equally weU describe liquid CHgOH and N- 
methyl-acetemide (NMA) with additive mod- 
els or a nonadditive model in which the 
charges are uniformly reduced (by 0.88) (28). 
Einally, the interaction free energy of Li-h with 
hexaanisole spherand is more accurately de- 
scribed by nonadditive than additive molecu- 
lar mechanical models (29). In addition, con- 
sidering off-center charges in electrostatic 
potential fit models of atoms with "lone pairs" 




4 Thermodynamics of Association 



177 



shows that they can often be important in 
leading to very accurate description of H bond 
directionahty (30). 

3.4 Long Range Electrostatic Effects 

To accurately describe the energy and struc- 
ture cf complex systems, not only are the func- 
tional form and parameters of molecular mod- 
els described by Equations 4.1 and 4.2 
important, but also the manner in which the 
long range electrostatic effects are repre- 
sented. The standard approach is to use a non- 
bonded cutoff for both electrostatic and van 
da* Waals interactions, which seems to be a 
reasonable method for proteins but seems to 
be a poor method to describe highly charged 
molecules such as nucleic acids. For periodic 
systems, Ewald methods (which are too com- 
plex to be described here) have been known for 
a long time to remove most of the artefacts 
arising from cutoffs, and impressive efficiency 
and accuracy of a variant called particle-mesh 
Ewald (PME) has been demonstrated for pro- 
tein crystals (3 1 ) [0.3 A rms deviation from the 
observed crystal structure for bovine pancre- 
atic trypsin inhibitor (BPTI) in a 1-ns simula- 
tion with an increase in computer time of only 
^^50% over standard cutoff methods]; the 
PME method also leads to accurate simula- 
tions cf proteins, DNA, and RNA in solution 
(32). 

4 THERMODYNAMICS OF ASSOCIATION 

We have focused mainly on the energy of asso- 
ciation between molecules; in any drug- recep- 
tor interaction, we typically want to know 
the equilibrium constant for association 
and the free energy of association AGrO. The 
difference between the free energy (AGO) and 
enagy (AE°) of association is given by 
AG" = AH" - TAP, and AH" = AE" + (APV). 
Fcr gas phase associations, (APV) is <^-RT, 
which is -0.6 kcal/mol at room temperature. 
Thus, this term, when added to AE, favors as- 
sociation (the more negative AG, the greater 
tendency for association). However, AS, the 
entropy cf association, is typically large and 
negative. The reason is that one is reducing 
the "floppy" degrees of freedom, which have 
laige translational and rotational entropies. 



by six (six translations and six rotations in the 
free molecules, three of each in the complex) 
during complex formation, and replacing 
these with vibrations, which have lower entro- 
pies (33). 

4.1 Gas Phase Association 

For example, at 300 K, two CH4 molecules 
have a translational entropy of 69 eu (entropy 
unit, or cal/K) and a rotational entropy of 3 1 
eu, whereas (CH4)2 has a translational en- 
tropy of 37 eu and a rotational entropy (as- 
suming a C. . .C distance of 4 & of 22 eu. Thus, 
one can see that the translational and rota- 
tional entropy contributions to the reaction 
2CH4 ^ (CH,), is - 4 1 eu. These six degrees of 
freedom become vibrations in the complex 
(CH4)2, and as such, might contribute a vibra- 
tional entropy of about 20-30 eu. Thus, for the 
dimerization of CH4 in the gas phase, we ex- 
pect TAS" of about - 3 to —6 kcal/mol at 300 K. 

As stressed in the second law of thermody- 
namics, the tendency for a chemical process to 
occur is governed both by the energy released 
(exothermicity) in the process and the entropy 
gained (the tendency of the reaction to go to a 
more random, disordered state). In the case of 
gas phase association, the energy term is in- 
variably exothermic if the reactants approach 
each other in an appropriate orientation, and 
the entropy term is always negative, opposing 
association. Table 4.3 gives an example of the 
thermodynamics of association of water mole- 
cules in the gas phase. As one can see, the 
entropy (AS") contribution to association of 
water molecules in the gas phase is substantial 
and negative; thus, there is little tendency for 
water molecules to associate in the gas phase 
at room temperature and 1 atm pressure, even 
though the hydrogen bond energy is about 5 
kcal/mol. 

4.2 Solvation Effects 

The thermodynamic cycle (Fig. 4.2) illustrates 
the problems we face in transferring our 
knowledge of gas phase intermolecular inter- 
actions to solution phase phenomena. 

Our real interest is in AG„ the solution 
phase free energy of association. Until now, 
our discussion has focused on the energy 
(AE4), enthalpy (AH,), and free energy (AG^) 
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Table 4.3 Thermodynamic Functions for 
Gas Phase Association of Water Molecules: 
2H20^(H20)2 



Thermodynamic 

Function 


Value for H 2 O 
Dimerization (kcal/mol) 


AE“ (0 K)“ 


-6.2 


AE" (300 K)“ 


-4.2 


AH’ (300 K)“ 


-5.2 


AS" (300 K)'’ 


-9.0 


AG“ (300 K) 


+3.8 



"See Joesten and Sehaad (13). 

''Estimated using the vibration frequeneies employed 
by Joesten and Sehaad (14), 



of association in the gas phase. To be able to 
calculate AG 4 , we need to know AG 3 , the sol- 
vation free energy of the drug-receptor com- 
plex; AG 2 D, the solvation free energy of the 
drug; and AQ the solvation free energy of 
the receptor. These solvation free energies are 
the free energies gained (or lost) by taking the 
molecule from a standard concentration in the 
gas phase to a corresponding concentration in 
solution. Using the thermodynamic cycle in 
Fig. 4.2, it follows that 

AG4 = AGi ~ AG2D “ AG2JJ + AG3 (4.3) 

Similar relationships hold for AH, and AS,. 
There is no reason to expect AG4 and AG, to be 
similar, so we face the problem of estimating 
AG2£>, AQ and AG3. We cannot measure 
AGap or AG3, because this would require us to 
vaporize a measurable amount of a receptor or 
drug-receptor complex. For most polar and 
ionic drugs, AQ is not measurable either. 
Therefore, one resorts to measuring the free 
energy of transfer from octanol to water 
AG 2 x>(oct) rather than the free energy of 
transfer from the gas phase to water, AGan- 
This situation underlies the postulate of the 
Hansch approach (34), which suggests that 
the differences in AG 2 £>(oct) [AAG 2 r>(oct)] may 
be related to the biological activity of drugs, 
and in many cases this desolvation (water — > 
octanol) does indeed seem to be related to drug 
binding and/or biological activity. 

Because the individual free energies in 
Equation 4.3 are so hard to measure, one is led 
to smaller model systems to analyze the major 
driving force for drug- receptor association, a 



step taken by Kauzmann (35) in his classic 
paper on the forces that affect protein stability 
and structure. He examined the thermody- 
namics of association and solution of small 
nonpolar molecules in aqueous solution. The 
associations were characterized by a large pos- 
itive entropy term and the solution by a large 
negative entropy, with the enthalpy terms less 
important. Thus, the well-known lack of solu- 
bility of hydrocarbons in water was not caused 
by a net loss of hydrogen bonds; the hydrocar- 
bons cause the water molecules to become 
more ordered (thus to lose entropy) so that 
they can still find a good hydrogen bond part- 
ner (AH of solution of these hydrocarbons is 
often negative, but much smaller in magni- 
tude than the TAS of solution). By coming to- 
gether in aqueous solution, these hydrocar- 
bons "release" some HgO’s, and this favorable 
TAS association is the driving force for this 
association. It is generally agreed that this 
"hydrophobic" effect of hydrocarbon groups is 
a key feature in many drug-receptor associa- 
tions. A lucid description of hydrophobic 
forces is given by Jencks (36) and DiU (37). 

Computer simulation approaches have 
proven very useful in enabling calculation of 
the association of molecules. For example, the 
association of two methane molecules in the 
gas phase would lead to a AE° (0 K) of '^- 7 ! 
kcal/mol, and by analog with water dimer (Ta- 
ble 4.3), a very positive AG° (300 K) and thus 
no tendency for association. In aqueous solu- 
tion, one can calculate, using modern statisti- 
cal mechanical simulation methods, the po- 
tential of mean force for association of two 
molecules, which is the free energy as a func- 
tion of molecular separation in solution. Al- 
though there is some controversy about 



AGi 



D(g) + R(g) ^ DR(g) 



AG 



2D 



AG 



2R 



D(aq) + R(aq) 



AG. 



AGi 



DR(aq) 



Figure 4.2. A schematic representation of the 
thermodynamic cycle for molecular association in 
the gas phase and in solution. 
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whether there are both "solvent- separated" 
and "contact" minima for two methane mole- 
cules in aqueous solution, there is no question 
that methane association is quite attractive in 
aqueous solution compared with the gas phase 
(38). 

One can also apply such approaches to 
study association of ionic and polar molecules. 
For example, the association of Na"^ and Cl“ 
has a free energy of association that is very 
small in magnitude, in contrast to the gas 
phase (39). The association of two amides 
through a C==0 . . . H-N hydrogen bond is 
very favorable in vacuo and progressively less 
favorable in non-polar and aqueous solution 
(40). Thus, water has a significant "leveling" 
effect on association, making nonpolar associ- 
ations more favorable and ionic and polar as- 
sociations less favorable than their gas phase 
counterparts. 

Let us now summarize the foregoing dis- 
cussion. Unlike the gas phase association, 
where AH, and AS, are invariably negative, 
fcr the corresponding thermodynamics in so- 
lution, AH, and AS, can be of either sign. The 
enthalpy of association AH, of two molecules 
in solution will bepositive if the interactions of 
the solvent with the uncomplexed drug and 
receptor are sufficiently stronger and more 
exothermic (AH, - AH, - AHgij is more pos- 
itive than AH, is negative) than are the inter- 
actions cf the solvent with the drug-receptor 
complex. Similarly, the entropy of association 
in solution AS, can be positive if AS, - AS - 
AS 2 R is more positive than AS, is negative. 
This can come about if the entropy gain from 
release of solvent from its interaction with the 
isolated drug and receptor is sufficiently 
larger than the entropy gain from release of 
solvent from the drug receptor complex. 

An additional important point to keep in 
mind is that the solution phase thermodynam- 
ics may be dominated (as in the case of the 
hydrophobic effect, the association of nonpo- 
lar solutes in water) by changes in solvent- 
solvent interactions in the presence of solute. 

It is also important to stress that even an 
analysis of the relative contributions of AH 
and AS to AG may not give definitive insight 
into the "nature" of the drug-receptor bond. 
Fcr example, a large positive AS (and small 
negative AH) for association might come from 



either a hydrophobic or an ionic association 
(35). In either case, the driving force for asso- 
ciation is likely "release" of H 2 O from "tight" 
binding to the solute. 

One final consideration in determining ei- 
ther gas phase or solution phase association 
constants of drug-receptor complexes is con- 
formational flexibility. Medicinal chemists 
have often attempted to synthesize rigid drug 
of different stereochemistries in the hopes of 
finding one that fits "perfectly" into the recep- 
tor site. If, for example, the drug has three 
equal energy conformations and only one can 
fit the receptor site, a price must be paid of 
AG = -i-RT In 3 in binding free energy relative 
to the drug that is "locked" in the right con- 
formation. If the receptor has to be locked in a 
conformation to "accept" the drug, one must 
pay a similar free energy price. A nice example 
of the latter situation is the difference in bind- 
ing free energies between "locked" and "un- 
locked" macrocyclic crown ethers (41) that 
bind t-BuNHa"^ cation. 

4.3 An Illustrative Example: Protonation 
of Amines 

Before we turn to some examples of drug-re- 
ceptor interactions, let us present a specific 
example of the difference between gas phase 
and solution interactions. We choose the pro- 
tonation of amines, because of the large liter- 
ature that attempted to explain the irregular 
order of p^Ta’s of the alkyl amines [NH 3 = 
9.25; CH 3 NH 2 = 10.66; (CHsisNH = 10.73; 
and (CH 3 ) 3 N = 9.81 1. This reaction can be rep- 
resented as 

AGi 

in the gas phase and 

AG4 

i?3N(aq) + H+(aq) ^R^NK^iaq) 

in aqueous solution. As we noted in connection 
with Fig. 4.2, the difference between the free 
energies of protonation in solution and the gas 
phase is given by 
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Table 4.4 Free Energies in Cycle (Fig. 4.2) for Protonation of Alkyl Amines (kcal/mol)“ 



R 3 N 


AGi 


AG 2 (H^) 


AG 2 (R 3 N) 


AGatRsNH^) 


AG 4 


NH 3 


-198.0 


269.8 


-2.41 


-78.0 


-3.79 


CH 3 NH 2 


- 210.0 


269.8 


- 2.68 


-67.7 


-5.22 


(CH 3 ) 2 NH 


-216.6 


269.8 


-2.41 


-61.0 


-5.39 


(CH 3 ) 3 N 


- 220.8 


269.8 


-1.34 


-54.4 


-4.06 



"See Aue et al. (42). 



AG4 - AGi = -AGaCi^sN) - AGaCH^) 

+ AGsiRsNW) 

Recall also that the solution = — logi^a^ 
= AG 4 °/ 2.3 RT. When the gas phase basicities 
were measured and showed a regular order, it 
was clear that the irregular order in solution 
was caused by a solvation effect. In the gas 
phase, NH, is a weaker base than ( 0113)3 by 
about 23kcal/mol; in solution this difference is 
only about 1 kcal/mol. 

Table 4.4 lists the free energies appropriate 
to the thermodynamic cycle (Fig. 4.2) for the 
protonation of the amines. Two points deserve 
strong emphasis. 

1. The magnitude of AG 4 is much smaller 
than that of AG, for protonation, because 
in aqueous solution, the amines must com- 
pete with H 2 O for the proton; in the gas 
phase there is no competition. 

2. As clearly analyzed by Aue et al. (42), the 
smaller the protonated amine, the more ef- 
fectively solvated it is, and the better base 
it becomes compared with its relative rank 
in the gas phase. 

5 CALCULATING FREE ENERGIES 

Free energy is certainly one of the most impor- 
tant concepts in physical chemistry. The 
groundwork on calculating free energies was 
laid by Kirkwood (43) and Zwanzig (43), and 
the first key "modern" developments and ap- 
plications came from the work of Postma et al. 
(44), Jorgensen and Ravimohan (45), Tembe 
and McCammon (46), and Warshel (47). The 
fundamentals of computational approaches to 
calculating free energies are reviewed by Bev- 
eridge and Mezei (48), and we attempted to 
exhaustively review applications up to 1993 
(49). 



To calculate the relative solvation free en- 
ergies of molecules A and B in solvent S, we 
can use a thermodynamic cycle such as in Fig. 
4.3. The relative solvation free energy of Aand 
B, determined experimentally, is AAGg^i.^ = 
AGgoiy (B) - AGgoi.^, (A), and because the free 
energy is a state function, AAG is also = /Q 
(S) - PQ (g), which are the free energies 
determined by computational means by "mu- 
tating" the molecular mechanical model of A 
into B in solvent S and in the gas phase (g). Of 
course, if B consists of all "dummy" (non-in- 
teracting) atoms, this approach leads to the 
calculation of the absolute solvation free ener- 
gies of A 

Being able to accurately calculate free en- 
ergies of solvation suggests a reasonable bal- 
ance in solute- solvent and solvent- solvent in- 
teractions. The next key challenge is to 
calculate AAG^^ind of guests G and G' to a host 
H, all in aqueous (or other) solution. 

A typical cycle for free energy calculations 
(45) where H is a host, G is a guest, and HG is 
the host-guest complex is given in Fig. 4.4. 

Now one requires a correct balance of sol- 
ute (host)-solute (guest), solute (host or guest) 
-solvent, and solvent-solvent interactions to 
correctly calculate AAG^indJ although there 
clearly can be compensating errors in the calcu- 
lation of AGgoi.^, and AG^i^^^j. 



* / \ ^®mut(9) p, , , 

A(g) ► B(g) 



^Gsoiv(A) 



AGsoiv(B) 



A(S) 



^Gmut(S) 



B(S) 



Figure 4.3. Basic thermodynamic cycle for solva- 
tion free energy. 
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H + G 



AG-i 



AG, 



solv 

t 

H + G' 



AGo 



HG 

AG 

t 

HG' 



bind 



AAGbind = AGbind “ = AG2 - AGi 



F^ure 4.4. Thermodynamic cycle for host- guest 
interactions. 



6 EXAMPLES OF DRUG-RECEPTOR 
INTERACTIONS 

W discuss three examples of "drug target" 
interactions: (l)biotin-avidin (2) dihydrofo- 
late reductase-trimethoprim, and (3)DNA-in- 
tercalator. The first is the strongest character- 
ized protein-ligand association, the second a 
prototype enzyme-inhibitor interaction, and 
the third describes drugs interacting with nu- 
cldc acids. 



using molecular dynamics to create 
an ensemble average of the system. The differ- 
ence between these calculated free energies 
AAGbind is equal to the difference in the ob- 
served relative free energies of ligand binding. 

The biotin- streptavidin system provides a 
"textbook case" of the relative free energies of 
the binding of biotin, aminobiotin, and thiobi- 
otin, as illustrated in Table 4.5. First, the cal- 
culated relative free energies are in reasonable 
agreement with experiment; thiobiotin is cal- 
culated and observed to bind or ^4 kcal/ 
mol more weakly to streptavidin than biotin, 
and iminobiotin is calculated and observed to 
bind or kcdmol more weakly than 
biotin. What is more interesting are the en- 
ergy components. Thiobiotin is easier to de- 
solvate than biotin by kcal/mol (AG^^iv) but 
interacts more weakly with the protein by ~13 
kcal/mol, leading to the observed kcal/mol 
preference for biotin binding. On the other 
hand, iminobiotin is '^5 kcal/mol harder to de- 
solvate than biotin, but interacts only kcal/ 
mol more weakly with streptavidin, thus lead- 
ing AAGbind = AGbind - AGsoi^ = 2 - (-5) to 



6.1 Biotin-Avidin 



Biotin (Fig. 4.5) is involved in the strongest 
known non-covalent macromolecule-ligand 
interaction. In fact, given the small size of bi- 
otin, it is surprising to many that this associ- 
ation is so strong (Ka corresponding to 

a ^AG cf ^20 kcdmol) (50). The X-ray struc- 
ture of streptavidin (arelated protein to avidin 
with nearly as large a biotin affinity) biotin 
conplex has been solved (51). The ureido 
group cf biotin was thought to be the reason 
' for the uniquely strong binding of this ligand 
to (strept)avidin. 

We have carried out free energy calcula- 
tions (13) on the relative binding of biotin, 
aminobiotin, and thiobiotin to streptavidin, as 
well as absolute free energy calculations of bi- 
otin binding. The results of these simulations 
are instructive in the insight they give us into 
this association. These free energy calcula- 
tions can best be understood by considering 
thermodynamic cycle in Fig 4.6. The free 
mergy calculations enable one to determine 
the free energies of the vertical processes by 
mutating one ligand into another in solution 
(AG nrtlTT ) and when bound in the active site 



COOH 





(CH3-)2 

H 



COOH 

H (CH3>2 




Figure 4.5. Structures of biotin and two analogs. 
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P + L1 



AG(bind1) 

— ^ ^(P-L1) 



AG(prot) 



AG(solv) 



t 

P + L2 



AG(bind2) 

^(P-L2) 



^^^bind ~ ^^bind2 ^^bind1 “ '^^prot '^^solv 



Figure 4.6. Thermodynamic cycle for protein-li- 
gand interactions. The experimentally measurable 
free energies are AG^j^di (horizontal), 

and the calculated values (AGg^iv and AGpr^t) are the 
vertical processes. 



its *== 7 kcallmol weaker binding to streptavidin 
than biotin. The above examples illustrate the 
interesting tradeoff in binding and solvation 
effects in analysis of ligand-macromolecule in- 
teractions. 

The fact that one loses only 4-7 kcal/mol 
out of the '^20 kcal/mol in free energy of bind- 
ing when mutating the ureido group to its thio 
and imino analog is strongly suggestive that 
the "ureido resonance," suggested by the crys- 
tallographers (50) who solved the structure as 
the reason for the unusually high cannot 
be the main reason. Calculations on the abso- 
lute free energy of biotin- streptavidin binding 
suggest that electrostatic effects, which might 
include ureido resonance (although perhaps 
not all of it), contribute '^6 kcal/mol to 
AAGbind> whereas van der Waals effects con- 
tribute ^14 kcallmol. 

The large contribution of van der Waals in- 
teractions (dispersion plus exchange repul- 
sion) is surprising to many, because an indi- 
vidual van der Waals atom- atom dispersion 
attraction is very small. But there are many of 



them in the streptavidin active site, which, not 
coincidentally, contains four tryptophan resi- 
dues. 

But why don't the van der Waals interac- 
tions with water lost when one moves biotin 
from water to the streptavidin active site can- 
cel with those gained in the active site? This 
can be understood by noting, as Sun et al. (52) 
and Rao and Singh (53) have, that a unique 
aspect of water as a solvent is its large ex- 
change repulsion contribution to This 

exchange repulsion contribution represents 
the "hydrophobic effect," the fact that meth- 
ane is less stable by 2 kcal/mol at a 1 M stan- 
dard state in water than in the gas phase. This 
exchange repulsion cancels (and sometimes 
outweighs) the dispersion attraction that oc- 
curs for any solute when transferred from the 
gas phase to a condensed phase. On the other 
hand, in the streptavidin binding site, preor- 
ganized during protein synthesis, one gains 
dispersion attraction when biotin binds with- 
out the compensation from exchange repul- 
sion. The magnitude of this effect is height- 
ened by the large "atom density" both in 
biotin, with its bicyclic structure and in 
streptavidin, with its four tryptophan resi- 
dues. Thus, the key aspects in biotin's tight 
binding with (strept)avidin is the preorganiza- 
tion and high atom density of the protein ac- 
tive site (54). 

Recently, Dixit and Chipot (55) have re-ex- 
amined this problem, using the improved 
power of modern computers to expand the 
sampling of configurations. The results con- 
tinue to be in good accord with experiments 
and offer a modern paradigm for how the con- 
vergence of these simulations can be moni- 
tored. 



Table 4.5 Results of Relative Free Energy Calculations® (kcal/mol) 











AAGhind 


Perturbation 


AGg„i^ 


AGprot 


Calc. 

^Gprot “ AGgo,.^ 


Exp’’ 

AGbind2 “ AGi^jndl 


Biotin thiobiotin 


8.8 ± 0.1 


12.0 ± 0.3 


3.2 ± 0.3 


3.6 


Biotin ^ iminobiotin 


-5.3 ± 0.1 


1.2 ± 0.7 


6.5 ± 0.8 


6.2 



"Errors, where listed, eorrespond to half the hysteresis between forward and reverse runs. 
^Experimental data. Ref. 51. 
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6.2 Dihydrofolate Reductase-Trimethoprim 

A classic example of a drug that works by spe- 
cies-specific protein inhibition is trimeth- 
oprim (TMP). Because this drug binds to bac- 
terial dihydrofolate reductase (DHFR) '^10'^ 
mae tightly than to the mammalian enzyme, 
there is a therapeutic concentration in which 
the drug can be used as an antibacterial with 
httle deleterious consequences for a mamma- 
lian host. 

DHFR was the first example where one has 
solved the X-ray crystal structure of the enzyme 
protein complexes for both bacteria and mam- 
malian enzymes. Matthews et al, (56) have sug- 
gested that it is a key hydrogen bond involving 
the pyrimidine ring of TMP, which is present in 
the bacterial but not mammalian enzyme com- 
plex, that is responsible for the selectivity. This 
has not been definitively established with car- 
bo^clic analogs, but analogs have clearly shown 
an important role of the three methoxy groups 
in TMP in causing species selectivity. For exam- 
ple, the TMP analog without the three OCH3 
groups have a binding preference for the bacte- 
rial enzyme of only '^10. 

Kuyper (57)has analyzed the structure of the 
bacterial and mammalian complexes and sug- 
gested that the oxygens of the — OCH3 group 
plays a key role in species selectivity. The me- 
thojy oxygens are significantly more solvent ex- 
posed in the bacterial complex that the mamma- 
lian. Thus, because these oxygens do not form 
hydrogen bonds to enzyme groups in either com- 
plex, the desolvation penalty for the oxygen is 
smaller in the bacterial enzyme and does not as 
extensively cancel the favorable hydrophobic 1 
dispersion effects on binding of the methoxy 
melhyl groups. This interpretation is supported 
by the fact that replacing the — OCH3 with 
CH2CH3 makes the molecules less species selec- 
tive; such analogs bind only a little better to bac- 
terial DHFR but significantlybetter to mamma- 
lian DHFR (58, 59). 

Free energy calculations/molecular dynam- 
ics have and will continue to give interesting 
insight into the DHFR-TMP species selectiv- 
ity (39-41). 

6.3 Nucleotide Intercalator 

Because our first two examples have empha- 
sized protein- small molecule interactions, we 



turn to a nucleic acid-smaU molecule interac- 
tion for our last example. There have been 
many experimental studies of the "intercala- 
tion" of flat, planar dyes into double-stranded 
DNA and other polynucleotides. 

The flexibihty of the sugar-phosphate back- 
bone allows the intercalator to be sandwiched 
between the nucleotides with relatively little 
"strain." The interaction with polynucleotides 
by a wide variety of intercalators has been stud- 
ied by physicochemical techniques. The driving 
force for association can be primarily hydropho- 
bic, as in actinomycinD, where the drivingforce 
for association is AS" (57), or it can contain a 
large contributionfrom electrostatic effects as in 
ethidium bromide and adriamycin analogs, 
where the driving force for association is AH" 
(60) (Table 4.6). Both molecules have binding 
association constants to DNA of about 10®. 
The role of dispersion binding is not clear at this 
point, but it is hkely to be very important as well 
( 1 3). As noted above, the ability of these drugs to 
interfere with DNA replication is apparently re- 
lated to their rate of dissociation from DNA 
rather than to their association constant K^. 
Muller and Crothers (2) showed that both acti- 
nomycin and actinomine had values of sim- 
ilar to that of DNA but the former had a much 
smaller and a much greater effect on the rate 
of DNA replication. 

7 SUMMARY 

The foregoing examples illustrate the likely na- 
ture of drug-receptor binding. It seems that hy- 
drophobic and dispersion binding do contribute 
a substantial amount to the net binding affinity. 
However we have noted some cases (e.g., the 
ureido group in biotin and the intercalation of 



T able 4.6 Thermodynamics of Binding 
of Drugs to DNA 



Drug" 


AH" 

(kcal/mol) 


AS“ 

(eu) 


AG" 

(kcal/mol) 


Proflavin 

Ethidium 


-6.7 


+4.7 


-8.1 


bromide 


-6.2 


+9.4 


-9.0 


Actinomydn D 


+2.0 


+39.0 


-9.6 


Daunomydn 


-6.5 


+7.7 


-8.8 



"Conditions in all cases as follow: T = 25®, 0.01 M buffer, 
p H = 7, 1 = 0.015 [see Quadrifoglio and Crescenzi (60)]. 
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positively charged groups into DNA) in which 
there might be an important polar or electro- 
static driving force for binding. Again, it is diffi- 
cult to ascertain whether these polar contribu- 
tions come from "freeing up" water or from 
direct interactions, but they seem to contribute 
in a significant fashion to the driving force for 
association as well as being important in deter- 
mining biological specificity. The lessons for the 
medicinal chemist attempting to design a drug 
to maximize the drug receptor association in- 
clude the following: 

1. Conformational flexibility can decrease the 
association constants in a straightfor- 
wardly predictable way. 

2. Hydrophobic effects usually contribute sig- 
nificantly to drug-receptor association, but 
one must also consider possible specific po- 
lar and ionic interactions. 

3. Preorganization of the receptor or ligand is 
a key to obtaining optimal electrostatic or 
van der Waals interactions. 

We have tried to provide examples in this 
chapter both of the qualitative arguments that 
are important for understanding ligand-protein 
or Hgand-DNA interactions and of some typical 
numerical results arising from computer exper- 
iments. Understanding these interactions is key 
to the rational design of inhibitors, and a com- 
puter-aided approach is increasingly being used 
to screen libraries of potential inhibitors and to 
suggest improvements to lead compounds ( 61 ). 
As force fields and sampling methods improve 
and as computers become ever-more powerful, 
the practical use of methods like these should 
improve as well. 

AUTHORS NOTE: 

Peter Kollman died unexpectedly in May, 
2001. He had authored an article on "Drug- 
Target Binding Forces" for the Fifth Edition 
of this series. This revision and extension for 
the Sixth Edition is based primarily on Peter's 
writings, and is dedicated to his memory. 
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1 INTRODUCTION 

1.1 Scope 

This chapter discusses molecular similarity 
and diversity methods and their main applica- 
tions to combinatorial library design, the se- 
lection of compound subsets, aftid ligand-based 
virtual screening. Protein structure-based vir- 
tual screening is discussed in chapters 6 and 7. 
Medicinal chemistry-relevant applications 
discussed include the design of "diverse," 
"representative," and “thematic/focused/bi- 
ased" libraries and subsets. The last applica- 
tion is of particular relevance, in that there is 
a recent trend to approach drug discovery by a 
"target class" or "gene family" approach; for 
example, 7-transmembrane G-protein-cou- 
pled receptors (7-TM GPCRs); nuclear hor- 
mone receptors (NHRs); ion channels; pro- 
teases; kinases; phosphodiesterases. 

1.2 Molecular Similarity/Diversity 

Molecular similarity and diversity methods 
have been developed based on the principle 
that similar molecules exhibit similar activi- 
ties/properties (1). Molecular similarity is a 
key concept in the identification of new mole- 
cules that have similar biological activity to 
one or more molecules of known activity. Mo- 
lecular diversity concepts are used to explore 
"chemical space," with the scope of applica- 
tion ranging from a particular structure/reac- 
tion to a large database of different molecules. 
The process of evaluating similarity and diver- 
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sity involves the calculation of descriptors for 
each structure and the determination of the 
proximity of compounds within the descriptor 
(or chemical) space. Virtual screening is the 
name given to the process by which these com- 
putational methods are used to identify a sub- 
set of compounds from a database for a specific 
purpose. The source database may, for exam- 
ple, be compounds in a corporate registry 
where the goal may be to identify compounds 
for a biological as say. Alternatively, the source 
database may be compounds that the chemist 
believes are synthesizable and the goal of vir- 
tual screening is to prioritize compounds for' 
synthesis. Depending on the amount of infor- 
mation available to guide the computational 
screening, and the method used, different lev- 
els of enrichment (number of actives selected 
in a set relative to a random selection) are ob- 
tained. It should be noted also that virtual 
screening applies not only to the selection of 
compounds for biological screening but also to 
the prioritization of compounds based on gen- 
eral properties of biological relevance, for ex- 
ample, selecting compounds more likely to be 
well absorbed. 

Molecules are typically represented by a 
vector of real-valued properties (molecular 
weight, logP, etc.) or binary values (e.g., 0 for 
absence, 1 for presence of a substructure fea- 
ture) in a bit-string or binary fingerprint. The 
term fingerprint or key or signature thus re- 
fers to an encoding of features/characteristics 
a molecule exhibits (e.g., substructures present, 
all possible combinations of 2-4 pharmaco- 
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(b) 




t 




Figure 5.1. A simple illustration of bit- 
string encoding of chemical structure (7). 

(a) A fragment dictionary -based approach. 

(b) Illustration of a hashing scheme using a 
path-based decomposition of the structure. 
The asterisk denotes an element in the bit 
string where a collision has resulted from 
the hashing procedure. 



photic features) as a string of bits (indicating 
either the presence or absence of a particular 
characteristic; see section 2.1.1 and Fig. 5.1), 
optionally including a count of the number of 
times the characteristic is exhibited. A wide 
variety of descriptors is available to evaluate 
the potential similarity or diversity between 
structures (2). These range from one-dimen- 
sional (ID) descriptors based on molecular 
properties such as molecular weight, which 
can be derived from the molecular formula; 
two-dimensional (2D) sub structural finger- 
prints, topological methods, and atomic/mo- 
lecular properties [e.g., physicochemical prop- 
erties such as calculated log P (c log P)] that 
require knowledge of the "flat" or 2D struc- 
ture, which represents the bonds between the 
atoms; to three-dimensional (3D) properties 
(e.g., pharmacophoric fingerprints), requir- 
ing knowledge of the full 3D conformational 
space available to a molecule. A 3D pharma- 
cophoric fingerprint marks the presence or 
absence of potential pharmacophores [com- 
binations of different features and distances 
between them, often for three- or four-point 
pharmacophore fingerprints (i.e., triplets/ 
triangles or quartets/tetrahedra)] within a 
molecule. 

Three-dimensional properties such as the 
pharmacophorefingerprints can also be calcu- 
lated for the target protein binding site, being 
doivedfrom site points complementary to the 
functional groups in the protein backbone and 
side chains, thus bridging the ligand-based 



and protein structure-based universes. The 
pharmacophore fingerprints also represent a 
simplified approach to the goal of providing 
molecular descriptors with 3D shape and 
property content, while obviating the need for 
molecular superposition or refined pharma- 
cophore hypothesis generation. 

Partitioning methods are widely used. The 
compounds are grouped using either a cell- 
based approach, in which each dimension of 
the chemical space is subdivided or "binned," 
or by a clustering approach, in which islands of 
similar compounds are formed. Alternatively, 
the distance between pairs of molecules can be 
calculated, and this distance minimized (for 
similarity) or maximized (for diversity). For 
diversity the goal is normally not to identify a 
diverse compound in isolation, but to explore a 
range of diversity through selection of a di- 
verse subset of compounds. Cell-based meth- 
ods provide the advantage of a common frame 
of reference in terms of the multidimensional 
cell positions. It is possible with a cell-based 
method to evaluate both what is there and 
what is missing (in terms of empty cells); clus- 
tering, by contrast, is based on exploring what 
is there. The same method/descriptor may 
thus be used to evaluate both similarity and 
"diversity." In practice, "dissimilarity" ap- 
proaches often provide a more acceptable ap- 
proach to diversity, ensuring that compounds 
are not too similar, but avoiding a potential 
pitfall of exploring too frequently the ex- 
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tremes of chemical space. Methods and de- 
scriptors are discussed for each of these cate- 
gories. 

1.3 Combinatorial Library Design 

Combinatorial library design is an important 
application of molecular similarity and diver- 
sity principles and methods. Combinatorial 
chemistry approaches can exploit automation 
and robotics to enable the rapid production of 
large numbers of compounds. Libraries are 
synthesized for both lead identification and 
lead optimization purposes. The resultant li- 
braries consist of products formed by combin- 
ing "reactants" (reagents, monomers) with 
each other or with a "scaffold" (template, 
core). The most efficient use of reactants and 
automation/robotics would use a strictly com- 
binatorial combination of reactants/scaffold, 
but other constraints, including the issue of 
generating products that have suitable prop- 
erties for biological screening and as potential 
drugs, often lead to sparse arrays. Parallel 
synthesis, in which multiple analogs are syn- 
thesized at a time, is now a standard part of 
the drug discovery process. 

Many molecular diversity and similarity 
approaches are brought together in the com- 
binatorial library design process. Either the 
properties of the reactants/scaffolds are used 
(reactant-based design) or the properties of 
the resultant enumerated products are used in 
selecting appropriate reactants (product- 
based reactant selection). The latter approach 
requires much greater computational re- 
sources, and a preselection of potential reac- 
tants may need to be made to control the total 
size of the "virtual" (potentially synthesiz- 
able) library to be analyzed. Regardless of the 
method, the required deliverable is sets of re- 
actants/scaffolds to be combined. When work- 
ing with the properties of the products, the 
constraint that reactants are to be used as ef- 
ficiently as possible presents a major optimi- 
zation problem. 

Virtual screening, with experimental veri- 
fication by biological screening, has provided a 
validation of many of the molecular similarity! 
diversity methods used for combinatorial li- 
brary design, and some ligand-based ap- 
proaches and examples are discussed in 
Section 3. 



1 .4 Subset Selection and Screening Set 
Enrichment 

A related task to combinatorial library design 
that uses molecular diversity/simileirity meth- 
ods is subset selection of compound screening 
sets. Initial efforts were focused on small "di- 
verse" or "representative" sets of large corpo- 
rate compound collections. The increased ca- 
pabilities of high throughput screening have 
changed the demand for such sets, and there is 
a renewed demand for "focused" and "repre- 
sentative" screening subsets of varying sizes; 
this includes target class (gene family) focus 
and the identification of "interesting" (e.g., 
novel) compounds in a large set. Newer bio- 
physical screening methods [e.g., NMR-based 
screening (3)] still have capacity issues and a 
need for smaller representative and focused 
sets. Diverse subset selection can be used to 
generate sets of compounds to probe a biolog- 
ical assay or to select a subset of reactants to 
probe the scope of a chemical reaction scheme 
or screen. However, such methods have a ten- 
dency to select compounds at the extremes of 
chemical space; that is, the selected com- 
pounds tend to be less suitable as drug candi- 
dates, and hence the approach is less favored 
for general screening sets. Rather, diversity 
methods are used to ensure that a random, 
subset of a screening set contains compounds 
that are representative of the whole, or, in 
conjunction with a focused method, to ensure 
a representative sampling of biologically rele- 
vant chemical space. 

Compound subsets focused/biased to par- 
ticular target classes (gene families) have be- 
come of greater importance, with application 
to both lead identification and de-orphaningof 
new targets from genomics studies. Properties 
important for the target class of interest are 
identified, using descriptors used for molecu- 
lar similarity/diversity. A focused subset can 
then be selected using a combination of all the 
possible hypotheses for activity for that target 
class, including the use of one or more molec- 
ular similarity approaches to select com- 
pounds similar to any known active com- 
pounds. For targets that have structural 
information available, docking methods (one 
widely used method for virtual screening) can 
be used to select compounds that are comple- 
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mentary to the binding site(s). Applications 
encompass both high throughput screening 
(HTS) and therapeutic area screening where 
only smaller numbers of compounds can be 
screened. For HTS, smaller thematic studies 
using these enriched focused sets enable the 
rapid prosecution of a set of related targets, 
and make the use of duplicate runs for all com- 
pounds feasible. This enables selectivity to be 
addressed up front, and the duplicate runs 
provide potentially higher quality informa- 
tion, with the potential for the identification of 
hits that might otherwise be missed. 

General enrichment of the available screen- 
ing compound set for lead identification is a 
major application for both combinatorial li- 
brary design/synthesis and compound acquisi- 
tion. The goal of in silico (i.e., computer- 
based) studies in compound acquisition is to 
evaluate the interest of compounds that could 
be purchased to add to the screening file, and 
to select a subset that meets the same type of 
physicochemical/“druglikeness” criteria dis- 
cussed for combinatorial libraries. The "inter- 
est" cf a compound or compound set is evalu- 
ated as in combinatorial library design: 
diversity relative to existing compound, tar- 
get, target-class focus, and so forth. 

2 MOLECULAR SIMILARITY/DIVERSITY 

The field of medicinal chemistry is based on 
the hypothesis that similar compounds will 
display similar, but probably not identical, ac- 
tivities in some biological screen, and that po- 
tency, selectivity, and properties can thus be 
modulated by analog synthesis. The challenge 
facing the computational chemist is how to 
represent compounds in a computer in such a 
way that "similar" compounds in the in silico 
world are "similar" in the biological world. It 
is evident that the biological process that is 
bang modeled will influence the nature of the 
chosen representation. For example, c log P is 
a useful descriptor for modeling processes in- 
voking cell penetration, whereas a pharma- 
cophoric representation would be more appro- 
priate for selecting compounds for screening 
against a particular protein active site. In this 
section we review the wide range of represen- 
tations that have been developed and describe 



the various methods for applying these repre- 
sentations to real-world problems. The reader 
is referred to a number of reviews on various 
aspects covered by this chapter (2, 4-9). A di- 
verse set of perspectives/reminiscences on 
computational aspects of molecular diversity 
has been assembled by Martin (10). 

2.1 Descriptors 

The problem lies in finding a representation of 
chemical structure that allows a mapping be- 
tween the chemical structure and its response 
in a biological or physical process. The repre- 
sentation must be general enough to be appli- 
cable to a range of chemical structures but 
specific enough to capture the differences be- 
tween structures that account for differences 
in response. Once found, this representation 
or set of descriptors can be said to define a 
chemistry space (1 IJor the population of com- 
pounds of interest. The similarity between 
two compounds is their distance within this 
space. Unfortunately, this simple statement 
hides a number of difficulties. Many descrip- 
tors of choice are correlated and it can be dif- 
ficult to combine categorical (e.g., acid, base, 
neutral) and real- valued (charge, dipole, c log 
P) variables. The issue of how to analyze com- 
pounds within the chemistry space is covered 
in Section 2.2. 

Methods for describing chemical structures 
fall into two broad classes. Two-dimensional 
(2D) methods can be calculated from the 2D 
graph in which atoms are nodes in the graph 
and the bonds are the connections between 
the nodes. Three-dimensional (3D) methods 
require the generation of a 3D structure (a:, y, 
z coordinates) for a structure. Because a mol- 
ecule does not exist in a single low energy con- 
former, the issue of conformer generation also 
requires addressing with this latter method. 
Combining the various descriptors, particu- 
larly 2D and 3D, is an area of active research. 

The potential advantage of 3D descriptors 
(ligand-protein binding is a 3D spatiaJ/elec- 
tronic property that can be described only in 
part using 2D descriptors) (5c) has led many 
groups to identify 3D descriptors that can han- 
dle large numbers of compounds and multiple 
potential models, and do not require a super- 
imposition in 3D coordinate space (e.g., for re- 
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view, see Ref. 12). The pharmacophore finger- 
prints described in Section 2.1.3 are an 
example of this. 

2.1 .1 2D Substructural and Topological De- 
scriptors. The principle behind substructural 
keys or fingerprints is shown in Figure 5.1. A 
molecule is encoded by the presence or ab- 
sence of a set of predefined atoms, atom types, 
and fragments (e.g., S, aromatic nitrogen, 
COsH) . The most widely used set of keys is the 
publicly available ISIS (MACCS) key set pro- 
vided by MDL (13a). An alternative to the use 
of predefined fragments is provided by soft- 
ware packages such as Daylight (13b) and 
UNITY (13c). In this approach, all possible 
bond paths in a molecule from zero (the at- 
oms) to a specified number of bonds (usually 7) 
are identified. A hashing procedure is used to 
store the paths in a bit string of fixed length. 
Each path will set several bits in the bit string 
(giving them the value of l)and there is the 
possibility of different paths setting some of 
the same bits. As a result, individual bits lose 
any meaning. 

The origin of the 2D substructural repre- 
sentation lies in the first chemical registration 
systems where some means was required to 
enhance the speed of compound retrieval. 
Thus, if the query molecule contains a partic- 
ular combination of features, the whole data- 
base can be screened very rapidly using the 
keys to identify compounds that are likely to 
contain those features before a more exhaus- 
tive graph matching is performed to ensure an 
exact match with the query. The features rep- 
resented in the keys (ISIS) or the fingerprint 
length and density (Daylight) were selected to 
optimize the process of compound retrieval. 
Nevertheless, they have proved very useful for 
a variety of similarity-based tasks (14). De- 
spite these successes, issues surrounding their 
use in diversity-based approaches have been 
highlighted (15). 

Molecular connectivity indices were f ir st 
proposed by Randic in 1975 (16) as a means of 
estimating physical properties of alkanes. 
This formalism was quickly extended to other 
types of molecules (17) and, since then, a wide 
range of indices has been proposed, as re- 
viewed by Hall and Kier (18) and Randic (19). 
The indices are derived from a graph theoret- 



ical representation of the structure where 
bonds are represented by the edges between 
nodes (atoms). They provide a direct represen- 
tation of the topological structure of a mole- 
cule encoding information such as the degree 
of branching (^;^) and the adjacency of the 
branch points (®;y), flexibility , and shape (20a). 
The superscript describes the number of 
bonds in the path between atoms used to cal- 
culate the index. The software package MOL- 
CONN-Z (20b) was developed specifically for 
generating these descriptors. A number of au- 
thors have included topological indices or vari- 
ants thereof in their description of molecules 
for describing compound collections (21) or 
large combinatorial libraries, often allied to a 
dimensionality-reduction algorithm such as 
principal components analysis (PCA) (6a, 23). 

Cahart et al. (24) introduced the concept of 
atom-pairs, where the topological distance 
(number of bonds) between atoms of specified 
element type are encoded in a bit string. This 
was extended to the topological torsion (25), 
where elements on all paths of length four are 
encoded. Kearsley et al. (26) extended this ap- 
proach to use more generic atom-type proper- 
ties in place of element type. They termed 
these types binding property classes because 
they represent key features of intermolecular 
interactions (positiveand negative charge; hy- 
drogen bond donor, hydrogen bond acceptor, 
and groups that are both of these, such as hy- 
droxyl; hydrophobic atoms; and all others). 
These descriptors have been used widely for 
similarity- and diversity-related tasks. The 
CATS descriptors of Schneider et al. (27a) are 
a variant on this approach. AH topological dis- 
tances (number of bonds) between a pair of 
binding property classes (e.g., acid-base) in a 
molecule are recorded with count information 
in a correlation vector; that is, how often that 
topological distance occurs between a specified 
pair of features in the molecule of interest. 
Similarity is calculated as the Euclidian dis- 
tance between the correlation vectors. These 
CATS descriptors were shown to be useful in 
scaffold-hopping, identifying actives with a 
structural type distinct from that of the initial 
lead structure, and have also been used as the 
basis for a de novo design program, TOPAS 
(27b). 

Eunctional diversity requirements of com- 
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pound libraries have been reviewed (28), for 
which molecular descriptors that relate to 
both structure and properties are needed, as 
well as their evaluation in terms of biological 
relevance. 

2.1 .2 Atomic/Molecular Properties and 2D/3D 
Structural Descriptors 

2.1. 2.1 Physicochemical. The descriptors 
in the previous section focus largely on the 
structure of the compound. The binding prop- 
erty classes generalize this to some extent by 
replacing relationships between elements or 
atom types with a broader definition, still 
within the framework of an atoms-and-bonds 
description of the molecule. An alternative ap- 
proach would be to describe compounds by 
whole molecule properties, such as molecular 
wei^t and logP. Indeed such properties have 
been related to important pharmacological 
and physical properties such as absorption 
across ceU membranes, distribution, and solu- 
bility. These properties are represented, in 
part, by the well-known Lipinski Rule-of-5 
based on molecular weight, calculated log P, 
and hydrogen bond donor and acceptor counts 
(29). Thus, such properties have an important 
role in drug design, and in general assess- 
ments cf " draggability. " However, their use as 
descriptors for tasks related to similarity or 
diversity in the context of receptor affinity is 
less clear and has been questioned (11b). A 
primary concern is that such properties do not 
reflect sufficient information regarding chem- 
ical structure to enable their use for lead fol- 
low-up or similar purposes. For example, a ste- 
nid and a benzodiazepine can have identical 
leg P values but are clearly dissimilar from a 
medicinal chemistry perspective. Another ma- 
jer problem is that many properties (e.g., log 
P, molecular weight, surface area, volume, 
molar refractivity, molecular polarizability) 
are correlated, making it difficult to find a rea- 
sonable set of orthogonal descriptors for the 
calculation cf meaningful distances or for cell- 
based partitioning (see Section 2.2.1). Such 
whole molecule properties are best used as 
constraints on a design, to define boundaries 
of a pharmacologically relevant chemical 
space or to define a distribution to match. The 
challenge is then how to combine the mea- 
sures cf diversity while simultaneously main- 



taining suitable physicochemical properties. 
This is addressed in later sections. Such prop- 
erties can also be used to identify particular 
combinations that are preferred for different 
gene families, and these are used to focus a 
design. 

2.1. 2.2 2/30 Structural. The issues with 
whole molecule descriptors mentioned above 
led Pearlman (1 l)and colleagues to look at an 
alternative representation ("BCUT" descrip- 
tors/metrics) based on atomic properties and 
on how atoms are connected. The approach 
stems from original work of Burden (30) to 
derive a unique signature for a molecule. 
Pearlman extended the concept to develop the 
BCUT descriptors suitable for diversity- and 
similarity-related tasks. Each molecule is de- 
scribed by a series of square matrices with 
atom labels defining the rows and columns. In 
a given matrix, the diagonal represents an 
atomic property such as charge, hydrogen 
bonding ability (donor/acceptor), or polariz- 
ability, with optional weighting by accessible 
surface area; the off-diagonal terms represent 
topological or Cartesian interatomic distance 
or other such property. Molecular descriptors 
are generated from the lowest and highest eig- 
envalues of these matrices, and describe the 
molecular surface distributions of positive or 
negative charge, H-bond donors, H-bond ac-. 
ceptors, and high or low polarizability. 

A number of such matrices can be calcu- 
lated based on the nature of the diagonal and 
off-diagonal properties and the scaling be- 
tween them. An "auto-choose" algorithm [see 
the DiverseSolutions (DVS) program below] 
typically finds a 5D or 6D orthogonal chemis- 
try space that best represents the diversity of a 
given population. This ability to identify rele- 
vant (to drug-receptor interactions and re- 
flecting molecular substructure) and orthogo- 
nal (noncorrelated) descriptors is critical for 
the effective use of both distance-based and 
cell-based methods. Three-dimensional prop- 
erties may be included by the use of a single 
conformer to represent atom-atom distances 
or the inclusion of quantum mechanical prop- 
erties (bond order or overlap- squared) from 
semiempirical molecular orbital (MO) calcula- 
tion. However, the inclusion of 3D/MO infor- 
mation significantly slows down descriptor 
calculation and does not appear to offer any 
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practical advantage. DVS, a suite of programs, 
has been written to calculate and manipulate 
the BCUT (and other) descriptors for a variety 
of library design, similarity- and diversity-re- 
lated tasks (31) (seeSection 2.2. 1.1). DVS uses 
the power of a cell-based method as a rapid 
means to derive a chemistry space relevant to 
the representation of the diversity of large 
populations of compounds and methods to 
pick diverse subsets and compare large data 
sets rapidly. The BCUTs provide an excellent 
diversity metric based on electronic properties 
directly related to ligand-receptor interaction 
that should also relate to biological activity. 
Indeed, BCUT metrics appear to reflect phar- 
macophorically important information, albeit 
in a relatively crude (low dimensional) fash- 
ion. They have proven useful for quantitative 
structure-activity relationship (QSAR) and 
quantitative structure-property relationship 
(QSPR) analyses (32a, b), classification of 
pharmacologically active compounds (32j), di- 
verse and focused combinatorial library de- 
sign (lid, 32c-e), rational compound acquisi- 
tion strategies (11c), and various other 
diversity-related tasks: 

2.1.3 3D Properties. The properties and 
descriptors above are essentially 2D in nature, 
in that they can be generated from the com- 
pound connectivity table, that is, from a 
knowledge of the bonding pattern within a 
molecule. There are many advantages to this, 
not the least of which is the speed of descriptor 
calculation. Nevertheless, compound interac- 
tions with most biological targets are largely 
3D in nature. That is, it is the disposition of 
key functional groups in the molecule in rela- 
tion to complementary groups within the en- 
zyme or receptor that is important. Thus, 
there has been much active research into how 
best to represent the spatial properties of mol- 
ecules. A particular issue that needs to be han- 
dled is conformational flexibility, given that 
most compounds have rotatable bonds that 
will change the 3D properties and there is no 
means apriori of identifying which particular 
conformation is the bioactive conformation 
(i.e., the conformation of the ligand bound to 
the biomolecular receptor). 

The methods presented below tackle this in 
one of three ways: 



1. A single fixed conformer is used. 

2. A relatively small number of representa- 
tive conformers is generated. 

3. An exhaustive enumeration of conformers 

is used. 

3D BCUTs are an example of case (l).They 
reflect confonnational differences, but only to 
a limited extent because they are inherently 
low dimensional. This is actually somewhat 
advantageous because the single low energy 
conformation from which they are computed 
may or may not be similar to the bound con- 
formation for a particular receptor. Pearlman 
(11) has noted that 3D BCUTs appear to be 
advantageous when the population of interest 
is a single combinatorial library but that, on 
average, 2D and 3D BCUTs appear to be 
equally advantageous when the population of 
interest is much more diverse. In cases (2) and 
(3), the descriptors need to be accumulated 
over all conformers. In the case of bit strings 
this means “ORing” them over all conforma- 
tions (combine using logical OR). Herein lies a 
potential issue with such techniques, in that 
data from multiple conformations could ob- 
scure the signal from the particular bound 
conformation relevant to a particular target. 

2. 1.3.1 3D Pharmacophores. The repre- 
sentation of a set of active compounds by a 
single or small set of pharmacophores that is 
necessary for that activity was first proposed 
many years ago and is an excellent model for 
lead optimization. The development of data- 
base systems capable of handling three-di- 
mensional structures in the late 1980s en- 
abled the further exploitation of such methods 
through giving the ability to search a corpo- 
rate collection for molecules containing a par- 
ticular pharmacophore. This approach to lead 
generation has proved highly successful (e.g., 
for reviews, see Ref. 33). In particular it is pos- 
sible to identify active compounds that con- 
tain a different core structure from that of the 
compounds used to generate the model (lead- 
hopping). This success and the importance of 
the pharmacophore hypothesis in understand- 
ing the interaction of a ligand with a protein 
target prompted groups to look for ways to use 
pharmacophores to generate a molecular de- 
scriptor for similarity- and diversity-related 
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Figure 5.2. Illustration of the creation of a pharmacophore key. As the conformation of a molecule 
changes, so do the distances between the pharmacophoric groups, shown as spheres. The two differ- 
ent three-point pharmacophores shown each set their own particular bit in the pharmacophore key. 



tasks. The diversity-related use was based on 
the! hypothesis that sampling over all potential 
phkrmacophores leads to diversity in a biolog- 
ically relevant space, in contrast to some other 
methods that focus on chemical diversity. The 
descriptor thus generated identifies in a sys- 
tematic way all the potential pharmacophores 
thtt a molecule could exhibit. Triplet (three- 
point) and quartet (four-point) pharmacoph- 
ae representations have been extensively 
used (in addition to two-point/2D approaches), 
with a variety of features sampled at each 
point and interfeature distances considered in 
a discrete set of ranges ("bins") (see Fig. 5.2). 
The ability of pharmacophores to divorce the 
thrnee-dimensional structural requirements 
for biological activity from the two-dimen- 
sional chemical makeup of a ligand has been 
highlighted in a recent review (34). 

In an initial implementation from the au- 
thors (35), a set of 5916 three-point pharma- 
cophore queries was generated and used to 
setirch a database. Compounds were charac- 



terized by the pharmacophores that they , 
matched. This method was powerful because 
it gave precise control over the queries that 
were generated and ensured that the com- 
pounds matched the query, as opposed to sat- 
isfying a set of distance constraints; however, 
it was slow in execution. The Chem-X/Chem- 
Diverse implementation (36) generates a 
pharmacophore fingerprint during the course 
of a single systematic conformational search, 
with a bump-check and/or rules to eliminate 
high energy conformers. The details of the 
conformational search and the definitions of 
the pharmacophoric features are key compo- 
nents of the system and this methodology has 
been used extensively for a range of library 
design and both diversity-and similarity- 
based tasks (e.g., see Ref. 37). The use of 3D 
pharmacophores in drug design applications 
has recently been reviewed (12, 34). 

To perform the necessary analyses to gen- 
erate the pharmacophore fingerprint, relevant 
features in a molecule need to be identified. 
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Either substructural definitionsto find pharma- 
cophoric features are applied at search time or 
atom types [and, optionally, additional centroid 
"dummy" atoms (35)] are used. These can be 
preassigned (e.g., on database registration) or 
assigned at search time; a variety of approaches 
is used (includingthe use of substructures and 
connectivity, and of more sophisticated compu- 
tational approaches). Six properties (features) 
have commonly been used to describe the poten- 
tial pharmacophoric features of a structure: 

1. Hydrogen bond donor (e.g., amide NH, ar- 
omatic amine, and hydroxyl) 

2. Hydrogen bond acceptor (e.g., carbonyl, 
ether, and hydroxyl) 

3. Basic ionizable center (positively charged 
at physiological pH of about 7) (e.g., ali- 
phatic amines, amidines/guanidines, and 
4-amino pyridine) 

4. Acidic ionizable center (negatively charged 
at physiological pH of about 7) (e.g., ceirbox- 
ylic acid, unsubstituted tetrazole, acyl 
sulfonamide) 

5. Aromatic rings (ring centroids often used) 

6. Hydrophobic regions (e.g., isopropyl, butyl, 
cyclopentyl, and certain aromatic rings) 

It has also been useful to define a seventh 
feature type in some situations. For example, 
it may be beneficial to classify separately the 
groups that can be both hydrogen bond donor 
and acceptor such as hydroxyl groups or imi- 
dazole nitrogens. Alternatively, the seventh 
feature provides a mechanism to identify an 
anchor point to substructures of particular in- 
terest (see Section 2.2.5). 

All combinations of three or four pharma- 
cophoric points (forming triangles or tetrahe- 
dra), for all accessible conformations of a given 
molecule, can be analyzed, with the resultant 
descriptor bit- string fingerprint (key) contain- 
ing the pharmacophores from the whole con- 
formational ensemble of the molecule (see Fig. 
5.2). Each bit represents a particular combina- 
tion of pharmacophore points (Donor-Aro- 
matic-Acceptor, Donor-Aromatic-Basic, etc.) 
and distances between them (defined using 
discrete ranges, or bins). 




Figure 5.3. Example of how a 3 D molecular struc- 
ture can be broken down into its constituent phar- 
macophoric elements. 

Figure 5.3 illustrates how a molecule can be 
broken down into pharmacophoric elements. 
The atom types can be assigned using sub- 
structural fragments, taking into account the 
environment (e.g., a NH group attached to a 
conjugating group such as C=0 is not basic or 
a H-bond acceptor). Atom types can be auto- 
matically assigned when reading a molecule, 
such as through a customizable sub structural! 
fragment database and parameterization file 
(e.g., Chem-X/ChemDiverse software, 37a). 
The fragments identify the environment of an 
atom or group, enabling the correct assign 
ment of a designed feature type. Different op- 
tions can be set (e.g., a hydroxyl group can be 
assigned to be both a hydrogen bond donor 
and acceptor and/or can be assigned to a spe- 
cial feature type for atoms that have bodh 
characteristics), and reassignment is possible 
at search time for structures stored in a datai- 
base. The identification and representation of 
hydrophobic regions is one of the most diffi- 
cult yet critical tasks. Dummy atoms can be 
used to represent the hydrophobic regions, as 
a centroid of a group of relevant atoms. This 
limits the number of hydrophobic features to 
comparable numbers to other features. An au- 
tomatic method to add them that uses bond 
polarities (hydrophobic regions defined for 
groups of three or more atoms that are not 
bonded to atoms with a large electronegativity 
difference) has been implemented in Chem-X/ 
ChemDiverse. Other pharmacophore atom 
types have also been developed (26). 

The extension to four-point pharmacoph- 
ores enables chirality to be handled and en- 
ables some elements of volume/shape linked to 




2 Molecular Similarity/Diversity 



197 



electronic properties to be included. This can 
^e a much better performance in similarity 
searching. It also increases enormously the 
number cf potential pharmacophores that 
need to be considered. To analyze pharma- 
cophoric patterns in molecules, the distances 
between pharmacophoric features are divided 
into a finite number of ranges using a pre- 
defined binning scheme (e.g., 0-2, 2-3, 3-5, 
5-8 A, etc.), up to a maximum distance nor- 
mally between 15 and 20 A [a nonuniform bin- 
ning is often used because this mirrors the 
tolerances (e.g., ±20%) used in 3D database 
searching that can be more appropriate than 
fixed increments, given the limited conforma- 
tional sampling that is possible]. The addi- 
tional pharmacophoric combinations created 
in moving from a three- to four-point descrip- 
tion provides additional shape information, 
thus increasing molecular separation in simi- 
larity and diversity studies. 

Separation has a central role in determin- 
ing the final result of such calculations, with 
too little separation resulting in a noisy de- 
scriptor and too many molecules being defined 
as similar, whereas when too large a separa- 
tion exists, trivial differences can have a 
disproportionately negative effect on the sim- 
ilarity value. Conformational sampling is nec- 
essary, and the granularity of this affects the 
useful resolution that can be used, as defined 
by the number and size of the distance bins. 
The sampling is generally performed by tor- 
sional sampling of rotatable bonds. 

Thus fewer ranges are generally considered 
with four-point pharmacophores while con- 
comitantly maintaining or improving on the 
performance of three-point pharmacophore 
methods. For example, by the use of 32 dis- 
tances for three-point pharmacophores with 
sevai different features possible for each of 
the points, there are about one million possi- 
bilities (35). Expanding to four-point pharma- 
^■1 tophores, just 15 distance bins generate about 

I B50 nulfion geometrically valid possibilities. 
Therefore for pragmatic reasons of both mem- 
bry/disk space, and the limited resolution of 
the conformational sampling that is normally 

■ applied, seven or 10 distance ranges for four- 
point pharmacophore fingerprints have been 
used hy Mason et al. (37) and recommended 
: for combinatorial library design and virtual 



screening applications. Around 2-10 mi llion 
different potential pharmacophores are re- 
solved in such a fingerprint. A limited sam- 
pling of conformations has generally been 
used to achieve reasonable times (in seconds) 
for descriptor calculation. For example. Ma- 
son et al. (37) use two (conjugated), three (sin- 
gle bonds), or four {sp^-sp^ and some conju- 
gated) increments with large data sets, using a 
systematic analysis for less flexible molecules 
and random sampling for flexible molecules. 
See Fig. 5.4 for a comparison of three- and 
four-point fingerprints. Software companies 
such as Accelerys, Tripos, the Chemical Com- 
puting Group (MOE, httpdlwww.chemcomp. 
com), and Treweren Consultants (THINK) 
are developing their versions of pharmacoph- 
ore fingerprinting methods, with three-point 
pharmacophore fingerprints already imple- 
mented. The automatic assignment of phar- 
macophore features such as hydrophobes, ac- 
ids and bases, conformational sampling, and 
other key options discussed above for the 
Chem-X software (now no longer supported; 
owned by Accelerys) such as nonuniform bin- 
ning are challenges that have variable levels of 
current implementation; other options and ex- 
tensions such as overlapping bins are becom- 
ing available. 

Others have developed similar approaches . 
for library design (38, 39). Horvath (40) gen- 
erates an autocorrelogram of feature-feature 
distances for conformers and calculates a dis- 
similarity score that takes into account sepa- 
rate weightings for each feature and allows 
fuzziness between the distance bins. These 3D 
pharmacophoric descriptors were termed 
fuzzy bipolar pharmacophore autocorrelo- 
grams (FBPAs), and the use of fuzzy logic to 
build up and compare the fingerprints avoids 
the "all-or-nothing" bitwise match of bit- 
string representations in which sampling arti- 
facts can cause significant differences. The 
method has been shown useful in library de- 
sign and for analyzing selectivity profiles in 
terms of pharmacophore similarity (41). 

It is possible to represent not only a ligand 
by the potential pharmacophores it possesses 
but also a protein target. In this case the phar- 
macophore points are identified by the posi- 
tions where a ligand atom of a particular type 
(donor, acceptor, acid, base, hydrophobic, aro- 
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Figure 5.4. Three- and four-point (triplet/quartet) pharmacophore fingerprint creation. Assign- 
ment is often binary (on or off), although a count can be kept, and has been used in more recent 
studies . The large difference in bin numbers between three- and four-point pharmacophores provides 
additional shape information, thus increasing molecular separation in similarity and diversity stud- 
ies. 



mafic centroid) is likely to bind and so provide 
a complementary interaction with the adja- 
cent protein residue side chain. The pharma- 
cophore fingerprints are thus generated from 
these complementary site points. The site 
points can be positioned in the active site us- 
ing methods such as GRID (42), in which an 
energetic survey of the site is made using a 
variety of functional groups. Figure 5.5 illus- 



trates the favorable energy contours for a va- 
riety of pheirmacophoric probes for the Factor 
Xa serine protease active site. Atoms (with as- 
sociated pharmacophore features) are then 
added in the positions for the most favorable 
interaction (also shown in Fig. 5.5). 

The resultant ensemble of atoms repre- 
sents a hypothetical molecule that interacts at 
all favorable positions in the binding site, and 
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Figure 5.5. GRID probes on Fac- 
tor Xa site and the combined re- 
sultant complementary site points 
that can be used for pharmacoph- 
ore fingerprint calculations (lower 
right). See color insert. 
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a pharmacophore fingerprint is calculated 
Ikm this. This fingerprint represents a form 
of "protein structure-based diversity," quanti- 
fying the range of different pharmacophoric 
shapes complementary to a target protein 
binding site. For example, for the Factor Xa 
serine protease active site, 1 3 complementary 
site points generated a fingerprint of 2103 
four-point pharmacophore shapes, of which 
354 were the same as the 2062 found for the 
serine protease thrombin, generated from 13 
site points. Only 11 significant complemen- 
tary site points were found for the serine pro- 
tease trypsin, which has a less defined S4 
pocket Of the 1233 total pharmacophore 
shapes, 363 were in common with Factor Xa, 
with 120 in common for all three serine pro- 
teases. It is thus possible to identify ensembles 
of pharmacophores that can be used to both 
differentiate the sites (selectivity) and identify 
common features. Comparison of these pro- 
tein-derived pharmacophore fingerprints with 
known ligands, using four-point fingerprints, 
shows that they can be used for searching for 
novel ligands within a database and that they 
are specific enough to capture ligand selectiv- 
ity between similar proteins such as the serine 
proteases thrombin. Factor Xa, and trypsin 
(37). With three-point fingerprints, the com- 
parison of ligand- and site-derived finger- 
prints could identify common binding motifs, 
although selectivity was not captured (37b). 

Pharmacophore fingerprints are relatively 
dow to calculate, however. Thus, their appli- 
: cation to very large virtual libraries requires a 
; great deal of computer power. Researchers at 
Chiron (12, 43) have developed a pharmaco- 
1 phore-based methodology applicable to reac- 
tants, OSPPREYS (Oriented-Substituent 
^.Pharmacophore PRo^ErtY Space). In this 
( approach, reactant pharmacophores are C 2 dcu- 
lated with respect to the reactant attachment 
atom and combinations of up to nine pharma- 
cophore centers are considered (see Section 
4.8). In the Gridding and Partitioning (GaP) 
approach, developed at GlaxoWellcome (44), 
reactants are aligned such that the bond be- 
tween the attachment atom and the first 
nonhydrogen atom is along the x-axis with 
the attachment atom at the origin. Aconfor- 
■ mational analysis is then performed and the 
! pharmacophore features are mapped to a 



1-A grid (Fig. 5.6). Cells occupied by a par- 
ticular feature are recorded in a bit string. 
This descriptor is ideally suited to monomer 
acquisition and reactant diversity. 

Topomer shape similarity, developed by 
Cramer (45) at Tripos, has been used for sim- 
ilarity searching and targeted library design 
(using Tripos’ proprietary software, “Chem- 
Space"), building on earlier work on steric 
fields of single "topomeric" conformers, 
clustering reactants by their 3D steric fields 
into "bioisosteric" clusters. The descriptor 
was considered to be useful in describing 
variations about a fixed molecular core, de- 
fining a single, unambiguous, aligned con- 
formation for any nonchiral molecule. 

Approaches such as the GaP program that 
exploit 3D descriptors for monomer selection 
address a need for an easily accessible set of 
in-house monomers available for library gen- 
eration. Such monomers need to be diverse in 
nature and able to probe regions of space 
through attachment to known leads, while 
producing compounds with druglike proper- 
ties. More detailed conformational searching 
paradigms can be used for the smaller mono- 
mer compounds, and approaches such as GaP 
and OSPPREYS exploit this opportunity. 

Eor the selection of diverse compound sub- 
sets, studies (46a) have compared three-point . 
pharmacophore descriptors and 2D finger- 
prints. These have highlighted benefits of the 
different approaches, and the improved per- 
formance of some combined descriptors. The 
use of clustering for the rational selection of 
compounds for acquisition and for in-house 
compound collections used for screening has 
also been investigated (46b), with comparable 
results obtained with 3D pharmacophore-de- 
rived fingerprints to the typically used 2D fin- 
gerprints. 

2.1.3.2 Shape. Pharmacophores capture 
the key features of intermolecular interac- 
tions. However, they do not explicitly capture 
the shape and volume of the ligand, even if this 
is crudely implied by the largest four-point 
pharmacophore exhibited, and the totality of 
potential pharmacophores exhibited across a 
range of conformations encodes shape frag- 
ments. Hahn (47) has described a method for 
three-dimensional shape-based searching im- 
plemented in the Catalyst program. Seven 
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Figure 5.6. Overview of the Gridding and Partitioning (GaP) proeedure as applied to monomers, 
exemplified using phenylalanine as a potential primary amine. This moleeule thus eontains two 
pharmaeophorie groups (the aromatie ring and the carboxylic acid). During the conformational 
analysis the locations of these pharmaeophorie groups are tracked within a regular grid. See color 
insert. [Reproduced from A. R. Leach and M. M, Hann, Drug Discovery Today, 5, 326—336 (2000), 
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shape indices, positive and negative extents 
along the three principal axes from the molec- 
ular centroid, and the volume of that con- 
former are computed and stored in a database. 
These indices can then be used for rapid com- 
parison with a query shape derived from ac- 
tive structures. Conformers passing this filter 
are then aligned with the query and the simi- 
larity is assessed from the volume overlap. 
Shape-based searching can be used indepen- 
dently, in which case it wiU complement a 2D 
similarity search. The method can also be em- 
fioyed in conjunction with a 3D pharmacoph- 
OB search; however, it is not clear that results 
are improved in this case (48). 

2.1. 3.3 Field-Based. A receptor site recog- 
nizes the surface properties of a molecule. 
These can be represented by different types of 
molecular fields, electrostatic, steric, and hy- 
drophobic, that can be calculated from the 
atomic composition of the molecule and com- 
paredusinga measure such as the Carboindex 
(49). A gaussian representation of the field al- 
lows for a more rapid alignment of the mol- 
ecules (50). Willett’s group has developed a 
program FBSS (51), which uses a genetic algo- 
rithm for the alignment of the molecular 
fields. They have compared the performance 
of this method with a 2D structural finger- 
print (UNTTYsoftware, (13c), in searching the 
WDI, a collection of drug molecules and com- 
pounds in development, and the BIOSTER da- 
tabase, a database of functional groups that 
have been used to replace other groups and 
retain biological function (e.g., a carboxylic 
acid and a tetrazole). Although the 2D mea- 
me will tend to find more bioactive mole- 
>cuies, the 3D measure gives a greater struc- 
.tural diversity in the hits (52). This seems to 
ibethe case for most 3D methods. In these ex- 
;^ples conformational flexibility can be con- 
sidered during the alignment stage but will 
isiow the search down considerably and may 
also lead to the algorithm becoming stuck in 
local minima. 

4 An alternative to using the molecule com- 
position in calculating the fields is to use mo- 
decuiar fragments as probes to represent pro- 
tein side chains. The interaction energy 
)hetweai the probe and the molecule is calcu- 
iirkted cn a grid surrounding the molecule. 
These grid fields can then be used in conjunc- 



tion with PLS as in the CoMFA (comparative 
molecular field analysis) 3D-QSAR methodol- 
ogy (53). More recently, these fields have been 
further transformed to generate 3D molecular 
descriptors. The VolSurf program (54) calcu- 
lates a wide range of descriptors from the grid 
energies [calculated with the program GRID 
(42)]. These have been shown to correlate to a 
range of properties such as membrane pene- 
tration and solubility (55). The Almond pro- 
gram (56) uses a transform known as the 
Maximum Auto-Cross Correlation (MACC) 
between pairs of grid nodes, to give a type of 
two-point pharmacophoric representation of 
the fields. Such descriptors have been useful in 
QSAR studies because they are alignment 
free; that is, they are independent of the posi- 
tion within the defining grid, and have also 
been used in reactant selection (Pickett, un- 
published results, 1999). However, the limita- 
tions of the lack of conformational flexibility 
have so far precludedtheir use in more general 
database searching and diversity applications. 

2.1.4 Analysis 

2.1. 4.1 Descriptor Transformations. A large 
number of potential descriptors are available 
and this presents a number of issues. Many 
descriptors will tend to be correlated with one 
another to a greater or lesser degree. There is . 
the question of the scale of the descriptors and 
also the difficulty of combining, say, a finger- 
print with a calculated property. Thus the de- 
scriptors must first be transformed in some 
way. A key study in this regard was the work 
of the Chiron group (57). Groups of similar 
descriptors were combined using principal 
components analysis (PCA) and multidimen- 
sional scaling (MDS), to give a total of 16 com- 
posite descriptors. D-optimal design was then 
used to further analyze a data set. Also of in- 
terest was the use of a "flower plot" to visual- 
ize the results. In the DPD (diverse-property 
derived) methodology (21a), the search was for 
six noncorrelated descriptors. The selection of 
relevant BCUT descriptors using a test is 
mentioned below. 

2.1. 4.2 Similarity and Distance Measures. A 

variety of measures exist for assessing the 
similarity or distance between molecules in a 
given descriptor space (2a), as described 
above. Similarity measures give a direct mea- 



202 



Combinatorial Library Design, Molecular Similarity, and Diversity Applications 



sure of similarity between molecules in some 
property space and give values in the range of 
0 to 1, with 1 being identical. Typical examples 
are the Tanimoto coefficient and the Cosine 
coefficient. For real-valued properties the Tan- 
imoto is defined as 



i=N 

i=l 

Tanimoto = 

i=N i=N i=N 

2 + 2 - 2 ^iAXiB 

i=l i=l i=l 

where is the value of property i of molecule 

A When i can take values of only 0 or 1 as in a 
bit string, then this reduces to 

Tanimoto = ab/(a + b - c) 

where a is the number of on-bits in A and c is 
the number of bits in common between A and 
B. The Cosine coefficient can be defined as 



Cosine = 



i=N 

2 ^iA^iB 
i = l 






i=N i=N ^jab 

2 ^XiA? 2 ^^iB? 

i=l i=l 



For field-based measures and overlap of elec- 
tron density functions then the Carbo index 
can be used (49), which is equivalent to the 
Cosine coefficient. 

Distance measures give 0 for identical 
structures and have an upper bound defined 
by the property space. The Euclidean and 
Hamming distances are the most common: 

I i=N 

Euclidean distance = 2 ^^ia ~ ^ 

yja + b — 2c 

i=N 

Hamming distance = 2 ktA “ 

i = l 

a + b — 2c 



The fundamental difference between similar- 
ity and distance measures is that the latter 



expressly include the absence of a feature (or 
low values for real-valued properties) in the 
measure of similarity. This has led to the sug- 
gestion (58) that, in the chemical domain at 
least, such measures are best for relative sim- 
ilarity; that is, ranking the similarity of two 
molecules to a target, as opposed to measuring 
the absolute similarity of molecules for which 
similarity measures, are preferred. 

Similarity and distance measures form the 
basis for most of the analysis and selection 
methods described in the next section and the 
reader is referred to the reviews by Willett et 
al. (2, and references therein) for a fuller dis- 
cussion of the characteristics and specific 
properties of these measures. 

2.2 Analysis and Selection Methods 

In this section we describe some general meth- 
ods for analyzing and partitioning large data 
sets, with particular reference to selecting rep- 
resentative or diverse subsets. Eibrary design 
also employs many of the strategies described 
here and is discussed in more detail in Section 
4. The methods fall into two broad categories: 
cell-based or partitioning methods and dis- 
tance-based methods. Partitioning methods 
use the population to define the limits for cells 
into which the compounds are divided. Adding 
or comparing to other compound sets requires 
identifying the cells into which the new com- 
pounds would fall based on their descriptors. 
This is very rapid and the partitioning process 
provides a frame of reference for many design 
tasks; for example, compounds can be readily 
identified to fill empty or poorly represented 
cells. Potential issues are where to place the 
cell boundaries and the handling of com- 
pounds that fall near to a cell boundary. Also, 
new compounds may fall outside the range of 
properties of the initial population. Distance- 
based methods, such as clustering and dissim- 
ilarity-based methods, require the calculation 
of similarity between members of the popula- 
tion and are thus population dependent. Add- 
ing new members to the population requires 
recalculating similarities and could change 
the distribution of compounds between the 
clusters. Identifying poorly represented or 
empty areas of property space is not possible. 
Each of these methods is further described be- 
low with examples of their application. 
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2.2.1 Cell-Based Partitioning Methods. Par- 
titioning methods divide chemistry space into 
hyperdimensional "cells" by "binning" the 
axes (descriptors) that define the chemistry 
vector space, just as the eight divisions on the 
X- and y-axes of a two-dimensional checker 
board divide the board into 64 squares. A 
chemical compound occupies a position in 
chemistry space determined by the descrip- 
tors (coordinates) computed based on its 
structure. Once the compounds have been 
partitioned, selecting diverse or representa- 
tive sets of compounds involves selecting a 
small number of compounds from each occu- 
pied cell, either in proportion to the number of 
compounds in the cell or a specified number 
fkm each occupied cell. For focused sets, com- 
pounds are sampled from cells neighboring 
the population of actives. The real advantage 
of partitioning methods, however, lies in their 
ability to readily identify underpopulated re- 
gions cf property space. Selections can then be 
made from a second population of mole- 
cules— a virtual library for instance — to in- 
crease the occupancy of underpopulated cells. 
Usually, such methods require a low dimen- 
sional representation of the space, although 
the pharmacophore methods are a notable ex- 
ception to this. The low dimensional space 
m^ be the result of a dimensionality-reduc- 
tion algorithm, as described earlier. Alterna- 
tively, a small number of descriptors may be 
judiciously selected. This latter approach was 
taken by Lewis et al. in their DPD methodol- 
ogy (21a), which is a good example of parti- 
tion-based selection. The aim was to select a 
representative set of compounds based on mo- 
lecular and physicochemical properties for 
screening. Six properties were chosen from 
among 49, based on their low pairwise corre- 

I ktion: number of H-bond acceptors, number 
of H-bond donors, molecular flexibility, an 
electrotopological state index, c log P, and a 
measure of aromatic density. Each descriptor 

■ i (axis)was divided into two to four partitions, 
I to give a total of 576 bins. A major issue was in 
I identifying six relevant and reasonably non- 

■ correlated (orthogonal) descriptors, leading to 
the definition of a new descriptor. The chosen 
ranges covered more than 85% of a 150,000 
subset cf the corporate collection and approx- 
imately three compounds were taken from 



each bin. Follow-up of initial hits involves the 
screening of additional compounds from the 
cells containing hit molecules. Several leads 
were identified using this approach (7). 

2.2.1, 1 Diverse Solutions. Divers eSolutions 
(DVS) is software developed by Pearlman et 
al. (11, 31) to generate and use the BCUT de- 
scriptors in addition to other DVS-computed 
or user-provided low dimensional descriptors. 
(DiverseSolutions is also designed to work 
with high dimensional metrics such as 2D fin- 
gerprints, and includes some novel algorithms 
for such distance-based work.) DVS uses a 
based "auto-choose" algorithm (11c) to iden- 
tify the combination of low-D descriptors, 
which are mutually orthogonal and which 
most uniformly distribute a given large popu- 
lation of compounds among the cells of the 
resulting chemistry space. Originally, the bin- 
ning was performed in a uniform manner 
along each axis, with a given percentage of 
outliers to avoid sampling the extremes of 
space. This could be useful for large sets of 
diverse compounds where the extremes tend 
to be undesirable compounds. However, for 
large (virtual) libraries initial filtering can re- 
move these before the analysis, and thus a 
nonuniform binning scheme was suggested 
(59), so that acceptable compounds are not lost 
as outliers, and is now the preferred option. 
Often, the large population of compounds ' 
used as the basis for defining a chemistry 
space is the entire compound collection avail- 
able to a pharmaceutical company for its drug 
discovery efforts, together optionally with 
structures from commercial databases of bio- 
logically active compounds. The resulting 
chemistry space can be regarded as the "cor- 
porate standard chemistry space" and pro- 
vides an ideal basis for comparing large sets of 
compounds such as alternative commercially 
available compound collections or alternative 
combinatorial libraries. It is also a good basis 
for comparing small sets of compounds such as 
compounds with reasonable affinity for vari- 
ous bioreceptors. 

The axes of a corporate standard chemistry 
space are intended to represent all aspects of 
molecular structure. Thus, all axes of the cor- 
porate chemistry space must be considered for 
purposes such as general diverse subset selec- 
tion or rational compound acquisition. How- 
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ever, not all aspects of molecular structure 
may be important for understanding struc- 
ture-activity relationships (SARs) for a partic- 
ular receptor. This led Pearlman and Smith 
(lid) to introduce the concept of a receptor- 
relevant subspace (RRSS) of a full chemistry 
space. For example, starting with a chemistry 
space of six dimensions, defined to best repre- 
sent the diversity of all druglike compounds in 
the MDDR (MDL Drug Data Report) database 
(13a), they showed how to perceive the three- 
dimensional subspace that conveys informa- 
tion that is particularly relevant for affinity to 
the ACE (angiotension converting enzyme) re- 
ceptor. ACE inhibitors of diverse structure 
were tightly clustered with respect to the re- 
ceptor-relevant metrics, thereby providing an 
obvious near-neighbor strategy for lead fol- 
low-up. They (lid) also emphasized the im- 
portance of not considering metrics that are 
not "receptor-relevant" when computing dis- 
tances for such near-neighbor-based discovery 
efforts. This also enables diversity in these 
other dimensions to be explored (e.g., with 
combinatorial libraries), to obtain compounds 
with a modified profile for other properties 
such as bioavailability. 

Work on the design and diversity analysis 
of large combinatorial libraries at Pharmaco- 
peia using BCUT metrics and DiverseSolu- 
tions was reported by Schnur (32). A cell- 
based analysis of synthon-derived libraries 
was performed, using full product libraries, in- 
cluding library comparisons. Active molecules 
in these libraries, which involved multiple 
scaffolds, were found to cluster in various 
three-dimensional subspaces of the diversity 
spaces. The utility of a simple property-based 
reactant/synthon selection tool was also de- 
scribed, targeted at the synthetic chemists, 
with reactants binned according to patterns 
based on the ranges of a set of user-selected 
properties that form a diversity hypothesis. 

Chemistry space metrics have been used at 
Rhone-Poulenc Rorer for diversity analysis, li- 
brary design, and compound selection (59, 80) 
using DiverseSolutionsto generate a "univer- 
sal" chemistry space for use as a standard for 
profiling structural sets of interest. The 
complementarity of three different diversity 
measures for comparing and profiling com- 
pound collections (a corporate database, com- 



binatorial libraries, and the MDDR drugs da- 
tabase) was also shown. The methods used 
were a 2D structural characterization (Day- 
light fingerprints). Divers eSolutions, and 3D 
pharmacophore fingerprints. A combinatorial 
library of 100,000 structures appeared struc- 
turally different from the other databases by 
the Daylight fingerprint clustering, yet the 
bulk of its compounds overlapped with drug- 
like compounds (MDDR) in DiverseSolutions 
BCUT chemistry space and 3D pharmacoph- 
ore space ("cells" in fingerprints). It was 
shown and "quantified" that new diversity rel- 
ative to the company database was explored, 
with much of this new diversity in desirable 
areas occupied by MDDR compounds. The 
nonuniform binning scheme was developed to 
enable the use of chemistry spaces scaled to 
include all structures within a set, while main- 
taining a reasonable distribution of com- 
pounds within cells. The method was used to 
select a subset for initial screening of a large 
set of combinatorial libraries designed for 
7-TM GPCR targets. 

2.2.1. 2 Pharmacophore Fingerprints. Phar- 
macophore fingerprints can also be considered 
as a high dimensional partitioning of the com- 
pound space (35). Underrepresented pharma- 
cophores within a population can be identified 
and act as a possible focus for library design or 
compound acquisition. Using six feature types 
(hydrogen bond acceptor, donor, acid, base, 
hydrophobe, and aromatic ring centroid) with 
four-point pharmacophores and 7-10 binned 
distance ranges, it is possible to resolve about 
2-10 million different phannacophoric shapes. 
Different databases can be compared using 
this fingerprint, and differences identified. For 
example, by comparing a corporate screening 
file (100,000 structures) with the MDDR data- 
base (62,000 structures) of biologically active 
compounds (as discussed above for Diverse- 
Solutions, Refs. 62, 80) "holes" could be iden- 
tified, in terms of about 1 million 3D pharma- 
cophores exhibited only by MDDR compounds 
(about 2.7 million were in common and 0.2 
million unique to the corporate set). This pro- 
vides a design space for which combinatorial 
libraries were designed and synthesized. A to- 
tal of 100,000 combinatorial library com- 
pounds were able to match about 40% (0.4mil- 
lion) of the pharmacophore "holes" (i.e., 
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Figure 5.7. Comparisons of the 3D four-point pharmacophore fingerprints exhibited by several sets 
[MDDR database of 62,000 biologically active compounds, a corporate registry database of 100,000 
compounds used for screening, 100,000 compounds from combinatorial libraries (from a four-com- 
poilent Ugi condensation reaction), and 14,000 compound random subsets (MDDR, corporate) or 
indlividual libraries]. The four-point potential pharmacophores were calculated using 10 distance 
range bins and the standard six pharmacophore features. 



MDDR pharmacophores not in corporate set), 
and additionally explore about 0.3 million new 
pharniacophores. Figure 5.7 illustrates the 
number cf pharmacophores found in these 
sets, together with those for the ACD (Avail- 
able Chemicals Directory), random 14,000 
subsets cf the database sets and some of the 
cxmbinatorial libraries (— 14,000 each, from a 
four-component Ugi condensation reaction, 
12 X 12 X 12 X 8 reactants). The relative rich- 
ness and diversity of the MDDR database, 
which includes structures from a large num- 
bQ" cf companies, is clear from the compari- 
sons. The contributions, and eventual dimin- 
ishing;return,of successive libraries using the 
same chemistry is discussed in Section 5.1.2 
(see Fig. 5.24 below). 

An example of the use of 3D pharmaco- 
phore fingerprints for the design of GPCR li- 
braries (37a) using "relative" fingerprints fo- 
cused around privileged substructures is 
described in Section 5.1.2. An approach that 
combines an optimization of a four-point 
pharnnacophore fingerprint and BCUT chem- 
istry s;pace diversity, using simulated anneal- 



ing, has been described (37d; see Section 4.7). 
Simulated annealing is a widely used optimi- 
zation methodology whereby the "tempera^ 
ture" of the system is used to control the de- 
gree of sampling of solution space. The 
"temperature" is cooled or annealed as the 
run progresses so that the system moves into a 
minimum for the function at low "tempera- 
ture." In the classical sense, temperature con- 
trols the kinetic energy of the system; in a 
more general sense, the "temperature" has no 
physical meaning and is a parameter to con- 
trol the sampling of solution space. Diversity 
was the goal (function to be optimized) of the 
studies reported, but the approach can equally 
be applied to optimize to a desired distribution 
of properties (e.g., from sets of biologically ac- 
tive compounds). The power cf this pharma- 
cophoric approach has been exemplified by 
Leach et al. in their GaP protocol for monomer 
acquisition (44). 

Pharmacophore fingerprints derived from 
complementary site points to a target binding 
site have been used as a quantification of "bi- 
ological diversity’Vstructure-based diversity 
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(37), defining a measure of the intersection 
between chemical and biological space. They 
can be compared to the pharmacophore finger- 
prints calculated from ligands, and the phar- 
macophore fingerprints of different target 
binding sites can also be compared to identify 
similarities (e.g., common binding motifs) and 
differences (e.g., for selectivity). The four- 
point pharmacophore fingerprint of a serine 
protease binding site was used to quantify all 
the possible binding modes. An example was 
given of how a combinatorial library could be 
designed to match as many as possible of these 
site pharmacophores, with the idea that the 
biological screening of the resultant library 
would provide information as to which hy- 
potheses lead to (the best) binding. The site 
points can be generated by both geometric 
methods (as implemented in Chem-X/Chem- 
Protein; see Ref. 133) or through energetic 
surveys of the site [e.g., by using a variety of 
probe atoms (as implemented and used for 
pharmacophore fingerprint generation) (37); 
see Section 2. 1.3.1]. 

The pharmacophore fingerprinting method 
thus provides a novel method to measure 
similarity when comparing ligands to their 
binding site targets, with applications such 
as virtual screening and structure-based 
combinatorial library design, as well as to 
compare binding sites themselves. Flexibil- 
ity of the binding site can also be explicitly 
accounted for by using a composite finger- 
print generated from several different bind- 
ing site conformations. 

2.2.2 Cluster-Based Methods. Clustering 
methods have a long history of application in 
chemical information (60). Any set of descrip- 
tors can be used in the clustering, but most 
typically some form of structural fingerprint is 
used in conjunction with a similarity measure 
such as the Tanimoto coefficient (see Section 
2. 1.4.1). The methods fall into two broad 
classes, hierarchical and nonhierarchical. 

Nonhierarchical methods such as that de- 
scribed by Jarvis and Patrick (61) have been 
widely used for compound selection from large 
databases (62). The principle behind the 
Jarvis-Patrick method is to group together 
compounds that have a large number of near- 
est neighbors in common. However, the 



method requires the user to specify the num- 
ber of clusters desired, and tends to be prone 
to singletons (clusters of one) and/or a small 
number of very large clusters. The cascade 
clustering methodology (59b) was developed 
to address some of these issues. Parameters 
were selected to produce an acceptable size 
distribution for the largest clusters and the 
small clusters were then reclustered. Doman 
et al. (63) have developed a fuzzy clustering 
technique, also based around the Jarvis- 
Patrick algorithm but which has no user-de- 
fined parameters and allows a compound to 
belong to more than one cluster. 

Hierarchical methods can be further subdi- 
vided into agglomerative and divisive meth- 
ods. Agglomerative methods start with each 
compound in a separate cluster and iteratively 
join the closest clusters together. Divisive 
methods start with a single cluster and itera- 
tively subdivide until each compound is a sin- 
gleton. Hierarchical clustering methods gen- 
erate a dendrogram showing the relationship 
between the compounds, the issue being the 
level at which to cut the hierarchy (i.e., how 
many clusters to generate). Although heuris- 
tics exist, there is no automated method. Such 
algorithms, however, at best scale to order 
(A^) in time, where N is the number of com- 
pounds, and so are limited in application to a 
few hundred thousand compounds at most 
(64). Nevertheless, they have been shown to 
be superior to nonhierarchical methods for 
clustering of chemical compounds (65). 
Ward's method was shown (5) to be the most 
effective at separating active from inactive 
compounds by clustering bit strings that de- 
scribe the presence or absence of 153 small 
generic and specific fragments (ISIS struc- 
tural key descriptors). Even better perfor- 
mance was obtained with the inclusion of 
pharmacophore distances between site points 
complementary to hydrogen bonding and 
charged groups combined with distances be- 
tween centers of aromatic rings and attach- 
ment points for hydrophobic groups. 

2.2.3 Dissimilarity-Based Methods. The meth- 
ods for compound selection described above 
essentially group compounds either by par- 
titioning into cells or by clustering. Dis- 
similarity-based methods (66) avoid this step. 
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The subset selection can be performed itera- 
tively. The first compound is chosen at ran- 
dom and the next compound is selected to be 
maximally dissimilar to the first; the third is 
then selected to be maximally dissimilar to the 
first two, and so on. The selection stops when a 
prespecified number of compounds have been 
selected or no more compounds can be chosen 
that are below a given similarity or above a 
certain distance to another compound in the 
selected set. Pearlman (llb,c) refers to such 
methods as "addition" algorithms because 
they add compounds to a diverse set of increas- 
ing size. He notes that such algorithms are 
quite satisfactory when the size of the desired 
subset is relatively modest but, given that the 
time required for such algorithms is propor- 
tional to the size of the total population and 
the square of the size of the desired diverse 
subset, they are far less satisfactory when, for 
example, selecting a subset of 10,000 from a 
population of 1,000,000. 

Alternatively, the number of desired com- 
pounds can be predefined and a stochastic al- 
gorithm used to maximize the diversity of the 
selected set, although these methods are even 
slower than addition methods. Sphere-exclu- 
sion methods, which Pearlman calls "elimina- 
tion" algorithms because the diverse subset is 
created by eliminating compounds from the 
superset, have been implemented in Diverse- 
Solutions (31) (see Section 2. 2. 1.1), providing 
a rapid distance-based diverse subset selection 
method. The minimum distance between 
nearest neighbors within the diverse subset is 
first defined (Dj^in). a compound is chosen at 
random, all compounds within are re- 
moved, a second compound is chosen at ran- 
dom, all compounds within are removed, 
and this is repeated until no more compounds 
can be chosen. The algorithm controls the size 
of the resulting diverse subset by automati- 
cally repeating the process with a larger or 
smaller value of as necessary. Because 
the time required for each elimination sub- 
set is proportional to the size of the superset 
and size of the subset (not size of subset 
squared), the elimination method, despite 
the need for (typically) four or five automatic 
repetitions, is far faster than the addition 
method and yields subsets of essentially 
equal diversity. 



Maximum dissiinilarity-based methods tend 
to give diverse selections, including many out- 
liers of less potential interest to a medicinal 
chemist. By contrast, methods such as sphere 
exclusion (minimum dissimilarity selection) 
tend to give representative selections that 
mimic the underlying distribution of com- 
pounds. The OptiSim method developed by 
Clark (66c) attempts to achieve a compromise 
between these two extremes. Three parame- 
ters are required: the first two, radius or sim- 
ilarity cutoff and maximum number of selec- 
tions are common to the other algorithms. A 
third parameter, K, is required to define a sub- 
sample size. Up to K selections are added to 
the subsamule at each iteration and the best 
compound from the subsample added to the 
selected set. At the limit ofK=l, this is equiv- 
alent to minimum dissimilarity selection, 
whereas at the limit of K = N (total number of 
compounds) the algorithm is equivalent to 
maximum dissimilarity selection. By altering 
the value of K the user can thus achieve a 
compromise between the diversity and repre- 
sentativeness of the selected set. Tests of the 
algorithm suggest that it is possible to achieve 
selections similar to those achieved from hier- 
archical clustering methods at a greatly re- 
duced computational cost. Maximum dissimi- 
larity methods were shown (66d) to lead to 
more stable QSAR models with higher predic- 
tive power, based on a comparative mean field 
analysis of angiotensin-converting enzyme 
inhibitors. 

Hudson et al. (67) describe two parameter- 
based methods for compound selection. The 
most descriptive compound (MDC) method is 
aimed at selecting compounds that represent 
the population as a whole. An information vec- 
tor is accumulated from the ranked Euclidean 
distance of each compound in the data set to 
all others. The most descriptive compound is 
that with the largest information, which 
equates to the compound with the smallest 
overall distance to all other compounds. The 
next compound is chosen to give the greatest 
additional information and so on. The sphere- 
exclusion method used attempts to select com- 
pounds that most effectively cover the prop- 
erty space. A compound is selected, say the 
MDC, and all compounds are removed that are 
closer to it than a user-defined radius. The 
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next selection is that compound in the remain- 
ing set that is closest to the one already se- 
lected. This process is repeated until no com- 
pounds are left for selection. The methods 
were applied to the selection of standard sets 
for biological screening at Wellcome. 

The maxmin approach (23c) uses the short- 
est nearest-neighbor distance as a measure of 
diversity in the sample: 

1 

£^maxmin = 1 " jy 2 max[miii(dy)] 

i ji=i 

This measure is particularly useful in select- 
ing diverse compound sets from a corporate 
collection, as exemplified by Higgs et al. (68). 
They also introduce the concept of a coverage 
design for lead follow-up, in which compounds 
are selected to be maximally similar to a set of 
leads. 

An alternative approach is to use the sum 
of pairwise similarities in the maxsum ap- 
proach: 

N N 

22 sim(i.j) 



This approach is particularly efficient when 
combined with the Cosine coefficient (69) and 
was used by Pickett et al. in combination with 
pharmacophore descriptors (70). In lower di- 
mensional spaces the maxsum measure tends 
to force selection from the corners of diversity 
space (6b, 71) and hence maxmin is the pre- 
ferred function in these cases. A similar con- 
clusion was drawn from a comparison of algo- 
rithms for dissimilarity-based compound 
selection (72). 

An excellent discussion of different diver- 
sity functions has been given by Waldman et 
al. (73). A set of ideal behaviors for a diversity 
function was defined. These are particularly 
relevant to library design tasks. Thus, al- 
though maxmin is suitable for selecting highly 
diverse molecules, it is less well suited to li- 
brary design optimization processes. An alter- 
native function was defined based on mini- 
mum spanning trees, which had previously 
been used by Mount et al. (71) and a gaussian 
error function, erf( ): 



Z) = 2 erf (ac?i) 

The reader is referred to reviews (e.g., see 
Refs. 2, 6, 7) for further detail and discussion 
of other related measures and methods for di- 
versity-based selection. 

2.2.4 Biasiiig to Desired/Desirable Proper- 
ties. Any of the methods above can be used to 
bias the compound selection toward a particu- 
lar region of property space, for example, by 
restricting the selection to cells or clusters 
containing known actives (in the latter case it 
may mean reclustering if the active is from an 
external source). However, it is a common ex- 
perience when applying diversity-based selec- 
tion to large databases to see a number of com- 
pounds that are undesirable for a number of 
reasons. They may be too large, too flexible, 
too lipophilic, contain too many acid groups, 
and so forth. Thus, it is general practice to 
apply filters during the selection process. 
These can include limits on property values 
such as calculated log P and molecular weight 
(68) and the application of substructure filters 
to remove undesirable or reactive compounds 
(21a, 66a, 74). Lipinski (29) formalized some 
of these ideas in the Rule-of-5, derived from 
analysis of orally absorbed drugs, with prob- 
lems more likely if two or more of MW > 500, 
c logP > 5, sum NH, OH (~ H-bond donors) > 
5, sum N,0 (~ H-bond acceptors) >10. Vari- 
ants of this, often stricter (e.g., only one viola- 
tion and/or lower values for hits/leads), are 
now widely used in conjunction with other 
methods for classification of compounds as 
likely to be orally absorbed or to penetrate the 
CNS. The reader is directed to several reviews 
on this topic (75). The usefulness of such ap- 
proaches has been shown by the work of Pick- 
ett et al. (76), where a library was designed 
using simple descriptors such as polar surface 
area for oral absorption (77). The designed li- 
brary showed improved absorption in a Caco-2 
system over a previous related library where 
the products had not been formally designed 
to these criteria. 

In a more general sense, compounds can be 
selected to reproduce a given set of property 
profiles for calculated logP, molecular weight, 
and so forth derived from, say, a set of known 
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Figure 5.8. Example of privileged four-point pharmacophores, either created from a ligand using a 
particular feature (e.g,, the centroid of a "privileged" substructure) or complementary to a protein 
site using a site point or attachment point of a docked scaffold. Only pharmacophores that include 
this special feature are included in the fingerprint, thus providing a relative measure of diversity 1 
similarity with respect to the privileged feature. 



drugs. Such an approach is most widely used 
as an additional constraint in library design 
algorithms (78) and is further discussed 
bdow 

An interesting example of biasing in com- 
pound selection is provided by Grassy et al. 
(79). Lead compounds were used to derive a 
range cf acceptable values for topological indi- 
ces amd other molecular descriptors. These 
were used to filter a large virtual library and 
led to an active compound being synthesized. 

2.2.5 Relative Diversity/Similarity. This de- 
scribes an approach that measures "relative" 
similarity and diversity between chemical ob- 
jects, in contrast to the use of the concept of a 
total or "absolute" reference space (80). The 
ability of 3D pharmacophoric fingerprint de- 
scriptors to separate ligand-binding proper- 
ties firom chemical structure has enabled a 
usduil modification to the way the descriptor 
is evaluated (37). It is possible to identify one 
of the points of a pharmacophoric description 
sudi as a triplet or quartet with a special fea- 
ture, such as a "privileged" substructure 
deemed important for binding or a pharma- 
cophore group. A fingerprint can be generated 
that describes the possible pharmacophoric 
shapes from the viewpoint of that special 
point/substructure (see Fig. 5.8). This creates 
a “relative" or "internally referenced" mea- 
sure of diversity, enabling new design and 
analysis methods. The technique has been ex- 
tensively used to design combinatorial librar- 
ies that contain "privileged" substructures fo- 
cused on GPCRs (37a), and this is described 



further in Section 5.1.2. The use of “receptor- 
relevant" BCUT chemistry spaces from Di- 
verseSolutions provides a different approach 
to a focused similarity/diversity measure (lid, 
32e-h). 



3 VIRTUAL SCREENING BY MOLECULAR 
SIMILARITY 

The use of molecular similarity to analyze 
large databases of structures using informa- 
tion derived from one or several ligands pro- 
vides a powerful ligand-based virtual screen- 
ing method (protein structure-based virtual 
screening methods are by comparison based 
on docking structures into a binding site). Vir- 
tual screening requires that a set of structures 
is ranked, with the goal of identifying new 
structures that have similar biological activ- 
ity, with top-scoring compounds sent for eval- 
uation in a biological assay. Usually, the re- 
quirement is to provide a small subset of 
compounds (10-1000) from a large set 
(100,000-1,000,000) of possible compounds 
for screening that is enriched in actives (i.e., 
contains a greater proportion of actives than 
that of the full compound set). In this context, 
enrichment involves identifying the highest 
number of new chemotypes as opposed to an- 
alogs of the query structure (s). Pharmaco- 
phoric methods have been found to be partic- 
ularly effective for this, building on the 
successful use of 3D database searching for 
lead generation. Other similarity methods 
such as the use of 2D descriptors (Section 
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2.1.1) are also commonly used to identify 
structures for screening based on the struc- 
ture of a known ligand. The use of similarity 
searching in chemical databases has been re- 
viewed by Willett et al. (2a), comparing newer 
types of similarity measure with existing ap- 
proaches. In this section the focus is on the use 
of the 3D pharmacophoric methods, which 
have been shown to provide a ligand-based vir- 
tual screening method that yields new chemo- 
types. 

3.1 Use of Geometric Atom-Pair Descriptors 

The topological atom-pair descriptors (24) 
have been extended by Sheridan and cowork- 
ers to geometric atom pairs (26), and shown to 
be effective at generating hit lists enriched in 
active molecules of different chemotypes. A set 
of precalculated conformations (-10-25) is 
used for each molecule, and each atom is as- 
signed two different atom types: (l)a binding 
property (donor, acceptor, acid, base, hydro- 
phobic, polar, and other); (2)a combination of 
element type, number of neighbors and 7r-elec- 
tron count. AU combinations of atom pairs are 
analyzed, for each conformation, and result- 
ant histograms of each probe and database 
molecule conformation are compared. The 
technique was compared with its topological 
equivalent (counting bond connections be- 
tween atoms to estimate interatomic "dis- 
tance"). This demonstrated that, although 
both methods were able to significantly enrich 
the highest ranking structures with other ac- 
tive molecules for the same target (-20- to 
30-fold enhancement over random in the top 
300 compounds), the 3D structure-derived de- 
scriptors were able to show their advantage by 
picking out active chemotypes with greater 
structural variation relative to those from the 
2D searches. The analysis used about 30,000 
structures from the Derwent Standard Drug 
File (SDF; version 6, developed and distrib- 
uted by Derwent Information Ltd., London, 
England, 1991, now known as the World Drug 
Index) using probe molecules with known ac- 
tivity against a particular target to rank the 
database. Sheridan et al. (26c) have also 
shown how a single combined atom-pair de- 
scriptor from a set of molecules can be used in 
a single fast search to provide results similar 
to those from the slower process of individual 



molecule-by-molecule searches. This provides 
the ability to search mixtures, which some 
companies use for high throughput screening, 
in that both the search query and/or the data- 
base being searched can be mixtures of struc- 
tures. 

3.2 Use of 3 D Pharmacophore Fingerprints 
(Three- and Four-Point) 

Some research groups have extended the at- 
om-pair descriptors to three-point (triplets) 
and four-point (quartets) pharmacophore de- 
scriptors (35, 37, 76, 81) as described in section 
2. These descriptors have a potentially supe- 
rior descriptive power, and a perceived advan- 
tage over atom pairs is the increased "shape" 
information (intrapharmacophore distance 
relationships) content of the individual de- 
scriptors (37a). The quartet (tetrahedral) 
four-point descriptors offer further potential 
3D content by including information on vol- 
ume and chirality (37a, 82), compared with 
the triplets that are components of the quar- 
tets and represent planes or "slices" through 
the 3D shapes. 

The fingerprints can be precalculated for 
database compounds, with conformational 
sampling, and stored in an efficient format 
(e.g., four-point pharmacophore fingerprints, 
where one line of encoded information uses, 
about 11 kilobytes of space for 1000 pharma- 
cophores). Probe fingerprints from one or 
more structures can be rapidly compared 
against such databases at speeds of >100,000 
compounds/min, even for large four-point 
pharmacophore fingerprints, representing 
about 10 million different pharmacophoric 
shapes. Similarity is measured using potential 
pharmacophore overlap and similarity indices 
such as the modified Tanimoto index (37a). 

The relative merits of two-, three- and four- 
point pharmacophore descriptors for different 
applications is an area of ongoing study (37, 
83). Figure 5.9 shows some structurally di- 
verse endothelin antagonists that exhibit low 
2D similarity, but maintain significant over- 
lap of their four-point pharmacophore finger- 
prints (37a). 

3.3 Validation Studies 

The validation issue for ID, 2D, and 3D de- 
scriptors for similarity searching and virtual 




3 Virtual Screening by Molecular Similarity 



211 



OMe 





Figure 5.9. Structurally diverse endothelin antagonists exhibiting low 2D similarity while ndain- 
taining common pharmacophoric elements crucial to activity. 



Bcreening has been addressed in several pub- 
lications (5d, 14a> 45, 72, 84, 85). Conflicting 
results have been reported, probably because 
of the way the different descriptors were used 
and biases in the test sets. Two primary con- 
cepts have been applied to the analysis of bio- 
logical data. The concept of "neighborhood" 
behavior (84) as a measure of descriptor utility 
has been promoted, based on the idea that if a 
descriptor is able to cluster molecules with a 
particular biological activity, the descriptor 
encodes information regarding the require- 
rnnals for that activity, and by extension is a 
useful measure for molecular similarity/diver- 
Bity. Comparisons using 2D fingerprints with 
^pharmacophore fingerprints with this ap- 
“proach led to the conclusion that 2D descrip- 
tors performed better than their ID and 3D 
^unterparts (14a, 45). However, issues with 
|the studies undertaken have been raised (85). 

E- 



These relate to bias in the data sets arising 
from the presence of closely related analogs, 
which by their nature have high 2D sub struc- 
tural similarities, and the way the 3D pharma- 
cophoric descriptors were generated (single 
conformation only) and used (bin setting, 
Tanimoto index). 

Some comparative studies of ligand-based 
virtual screening methods have been under- 
taken within Bristol-Myers Squibb (85) using 
more optimum settings for pharmacophore 
fingerprint generation [four-point pharma- 
cophores, 7 distance bins, and full conforma- 
tional analysis (37a)], which gave quite differ- 
ent results. An example using melatonin as a 
probe molecule to search against a database of 
about 150,000 compounds containing about 
250 known melatonin antagonists is shown in 
Fig. 5.10. The graph shows the hit rates ob- 
tained by similarity ranking in terms of the 
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Figure 5.10. Ligand-based virtual screening/similarity analysis of a 150,000-compound database con- 
taining about 250 known melatonin antagonists, showing the strong performance of the pharmacophore 
descriptors, and the complementarity of atom pairs and pharmacophore descriptors when combined. 



number of active compounds located across 
the top-ranking 1000 compounds. In this case, 
for the 2D descriptors shown, only the atom- 
pair (26a) descriptor (which has elements of a 
two-point pharmacophore fingerprint with in- 
tercenter distance replaced by bond count) 
produces comparable results. A 2D similarity 
search using the UNITY 2D fingerprint (13c), 
with a very low 50% similarity cutoff, pro- 
duced a hit list of 1669 compounds containing 
only 10 melatonin actives (an enrichment of 
3.6 relative to random screening of 1669 com- 
pounds, for which 2.8 actives would be ex- 
pected to be found). In contrast, the pharma- 
cophore fingerprint similarity search finds 93 
actives in the first 1669 compounds, a total 
enrichment of >33 relative to random, and a 
further >ninefold relative to the 2D similarity 
search. Preliminary studies on systems with a 
much wider structural diversity in active li- 
gand chemotypes suggested hit rates even 
more favorable to pharmacophore finger- 
prints, with cases where no 2D methods were 
able to improve on random hit rates. It is of 
interest that averaging the four-point phar- 
macophore/atom-pair rankings leads to even 
better results in the melatonin investigation, 
highlighting (45) the potential advantages of 
combined descriptors. 



The advantage of 3D pharmacophore- 
based geometric descriptors over topological 
descriptors in being able to pick up new che- 
motypes with major 2D structural variations 
is of particular importance when exploiting a 
peptide lead. In such cases, the goal of screen- 
ing is normally a nonpeptidic molecule. Using 
the pharmacophore fingerprint from rela- 
tively large and flexible peptidic molecules 
(e.g., tetrapeptides), it is possible to identify 
structures that match just part of the pharma- 
cophoric information; a modified Tanimoto co- 
efficient can be used to reduce the penalty for 
only a partial match. It is possible to identify a 
set of reasonable structures that, as an ensem- 
ble sample most of the potential pharmaco- 
phores exhibited by the peptide. With 2D 
methods the high ranking molecules will tend 
to be similarly peptidic, rather than more 
druglike molecules exhibiting 3D properties of 
the peptide. 

An example of the use of peptidic informa- 
tion comes from the work of Pickett et al. (76) 
using the known tripeptide RGD (Arg-Gly- 
Asp) (see Fig. 5.11) motif fibrinogen uses for 
receptor binding (86). A database of 100,000 
compounds, which had been seeded with fi- 
brinogen receptor antagonists covering a wide 
range of structural classes (see Fig. 5.12), was 
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4 COMBINATORIAL LIBRARY DESIGN 

4.1 Combinatorial Libraries 

Normally, the term combinatorial library im- 
plies a library of a few hundred to many thou- 
sands of products produced using high 
throughput robotics in a facility dedicated for 
such purposes. In contrast, the term parallel 
library implies a library of less than 10 to a few 
hundred products produced using more or less 
traditional medicinal chemical synthetic pro- 
cedures or increasingly common low through- 
put robotics. In either case, lists of reactants 
are combined in a combinatorial fashion to 
yield an array of products. The methods de- 
scribed in this section are applicable equally to 
both cases. A strictly combinatorial combina- 
tion of reactants (or reactants and scaffold) 
produces the most efficient use of reactants 
and automation/robotics for a library synthe- 
sis. However, the issue of generating products 
that have suitable properties for biological 
screening and as hit/lead material is key, and 
constraints discussed below are often used, re- 
sulting in not all combinations of reactants 
being used. 

4.2 Combinatorial Library Design 

The process of combinatorial library design 
brings together many molecular diversity and 
similarity approaches with the aim of identify- 
ing a set of reactants that are to be combined 
(reacted) to form products. Combinatorial li- 
brary design is, inevitably, an iterative pro- 
cess: software is used to suggest lists of reac- 
tants; chemists accept some suggestions but 
reject others (for various reasons ranging 
from cost or availability to poor synthetic 
yield). If software is to be used to suggest re- 
placements for rejected reactants, it must be 
designed to accommodate this iterative pro- 
cess. 

Although the objective is always to identify 
which reactants should be used to make the 
products, there are two fundamental ap- 
proaches to library design: reactant-based 
methods and product-based methods. Purely 
product-based methods, which select (or 
cherry pick) desired products without regard 
for the number of reactants required to form 
those products (as in standard similarity 



searching or diverse set selection), lead to non- 
combinatorial synthetic schemes and are 
clearly at odds with the efficiency objectives of 
a combinatorial synthesis. Reactant-based 
methods suggest lists of reactants based solely 
on comparisons of reactant properties without 
regard for the properties of the resultant vir- 
tual products. Thus, reactant-based methods 
avoid the need for enumerating what could be 
very large numbers of virtual products and 
making even greater numbers of comparisons 
of product properties. Compromise solutions, 
discussed later, that approximate certain 
product properties from the reactants without 
enumeration have been developed. By directly 
selecting a desired number of each type of re- 
actant, the chemist can ensure an efficient, 
full-combinatorial array design, and the expe- 
diencies of reactant-based methods led to their 
widespread use for the design of both large 
and small combinatorial libraries. However, 
the growing awareness of a need to maintain 
druglike properties, if only for practical issues 
such as solubility, has led to the use of reac- 
tant-biased methods that consider the proper- 
ties of the products from enumerated struc- 
tures, for which the full array of ligand- and 
structure-based design methods can be ap- 
plied, and the resultant synthesis of sparse 
rays (see later). In addition, the assumption 
that optimal product diversity can be approx- 
imated by using diversity-optimized reactants 
has been questioned (87), and several product- 
based diversity methods for selecting reac- 
tants have been developed that consider the 
need for full or sparse combinatorial arrays in 
the design process. The rationale behind this 
observation can be understood when one con- 
siders that "constraints" based on whole mol- 
ecule properties and some form of molecular 
similarity/diversity are usually required. 

Pearlman (11,87c) has made this argu- 
ment quite convincingly by comparing the re- 
sults of alternative diverse library design 
methods in a low dimensional chemistry 
space, as illustrated in Fig. 5.13. Figure 5.13a 
depicts a virtual library of 634,721 allowed 
combinatorial AB products (remaining after 
optional filtering of the full virtual library 
based on Lipinski's Rule-of-5 "druglikeness" 
criteria) in a chemistry space specifically cho- 
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Figure 5.13. (a) A virtual library of 634,721 allowed combinatorial AB products (after filtering out 
products that failed Lipinski's Rule of 5 “druglike” criteria) shown in a BCUT chemistry space 
^3ecifically chosen to best represent the diversity of the virtual library, (b) The maximally diverse 
96013 -compound subset of the virtual library, illustrating the results of purely product-based "library 
design.” Although providing the maximal diversity, synthesis of these 9600 AB products would 
require the use of 347 A's and 1024 B's-clearly unacceptable from the perspective of synthetic 
economy (numbers of reactants and robotic control), (c) The 9600-compound library resulting from 
the traditional, purely reactant-based library design strategy of selecting the 80 most diverse A's and 
the 120 most diverse B's. Although providing user-selected synthetic economy, the diversity of these 
96010 AB products is clearly quite poor, (d) The 9600-compound library resulting from the reactant- 
biased, product-based (RBPB) algorithm developed by Pearlman and Smith (see Refs. 31, 87c and 
text). The algorithm selected a different set of 80 A's and a different set of 120 B's, thus providing the 
same level of user-selected synthetic economy, while also providing substantially greater diversity 
than could be achieved using a purely reactant-based library design strategy. See color insert. 



sen to best represent the diversity of that vir- 
tual library. Figure 5.13b illustrates an opti- 
mally diverse "library" of 9600 products 
selected hy using cell-based diverse subset se- 
lection to cherry pick the 9600 most diverse 
products without regard for synthetic econ- 



omy. Although the diversity of these products 
is clearly optimal, the fact that 347 A's and 
1024 B's would be required to make the 9600 
AB products provides an equally clear indica- 
tion of why purely product-based methods are 
unsatisfactory from an economical perspec- 
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4 COMBINATORIAL LIBRARY DESIGN 

4.1 Combinatorial Libraries 

Normally, the term combinatorial library im- 
plies a library of a few hundred to many thou- 
sands of products produced using high 
throughput robotics in a facility dedicated for 
such purposes. In contrast, the term parallel 
library implies a library of less than 10 to a few 
hundred products produced using more or less 
traditional medicinal chemical synthetic pro- 
cedures or increasingly common low through- 
put robotics. In either case, lists of reactants 
are combined in a combinatorial fashion to 
yield an array of products. The methods de- 
scribed in this section are applicable equally to 
both cases. A strictly combinatorial combina- 
tion of reactants (or reactants and scaffold) 
produces the most efficient use of reactants 
and automation/robotics for a library synthe- 
sis. However, the issue of generating products 
that have suitable properties for biological 
screening and as hit/lead material is key, and 
constraints discussed below are often used, re- 
sulting in not all combinations of reactants 
being used. 

4.2 Combinatorial Library Design 

The process of combinatorial library design 
brings together many molecular diversity and 
similarity approaches with the aim of identify- 
ing a set of reactants that are to be combined 
(reacted) to form products. Combinatorial li- 
brary design is, inevitably, an iterative pro- 
cess: software is used to suggest lists of reac- 
tants; chemists accept some suggestions but 
reject others (for various reasons ranging 
from cost or availability to poor synthetic 
yield). If software is to be used to suggest re- 
placements for rejected reactants, it must be 
designed to accommodate this iterative pro- 
cess. 

Although the objective is always to identify 
which reactants should be used to make the 
products, there are two fundamental ap- 
proaches to library design: reactant-based 
methods and product-based methods. Purely 
product-based methods, which select (or 
cherry pick) desired products without regard 
for the number of reactants required to form 
those products (as in standard similarity 



searching or diverse set selection), lead to non- 
combinatorial synthetic schemes and are 
clearly at odds with the efficiency objectives of 
a combinatorial synthesis. Reactant-based 
methods suggest lists of reactants based solely 
on comparisons of reactant properties without 
regard for the properties of the resultant vir- 
tual products. Thus, reactant-based methods 
avoid the need for enumerating what could be 
very large numbers of virtual products and 
making even greater numbers of comparisons 
of product properties. Compromise solutions, 
discussed later, that approximate certain 
product properties from the reactants without 
enumeration have been developed. By directly 
selecting a desired number of each type of re- 
actant, the chemist can ensure an efficient, 
full-combinatorial array design, and the expe- 
diencies of reactant-based methods led to their 
widespread use for the design of both large 
and small combinatorial libraries. However, 
the growing awareness of a need to maintain 
druglike properties, if only for practical issues 
such as solubility, has led to the use of reac- 
tant-biased methods that consider the proper- 
ties of the products from enumerated struc- 
tures, for which the full array of ligand- and 
structure-based design methods can be ap- 
plied, and the resultant synthesis of sparse 
rays (see later). In addition, the assumption 
that optimal product diversity can be approx- 
imated by using diversity-optimized reactants 
has been questioned (87), and several product- 
based diversity methods for selecting reac- 
tants have been developed that consider the 
need for full or sparse combinatorial arrays in 
the design process. The rationale behind this 
observation can be understood when one con- 
siders that "constraints" based on whole mol- 
ecule properties and some form of molecular 
similarity/diversity are usually required. 

Pearlman (11,87c) has made this argu- 
ment quite convincingly by comparing the re- 
sults of alternative diverse library design 
methods in a low dimensional chemistry 
space, as illustrated in Fig. 5.13. Figure 5.13a 
depicts a virtual library of 634,721 allowed 
combinatorial AB products (remaining after 
optional filtering of the full virtual library 
based on Lipinski's Rule-of-5 "druglikeness" 
criteria) in a chemistry space specifically cho- 
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thesis of a library of compounds with a high 
degree of control over associated properties. 

Thus, the combinatorial library design pro- 
cess brings together many of the methods al- 
ready described for molecular similarity and 
molecular diversity coupled to synthetic feasi- 
bility considerations. Diversity-based and 
structure-based approaches to the design of 
virtual libraries have been reviewed (7, 91a). 
Both ligand-based and protein structure- 
based virtual screening methods can be used, 
with the combinatorial nature of the virtual 
compounds being exploited to increase the 
speed of the analysis. Some properties of the 
products can be estimated rapidly on the fly 
from the reactants, and products can be gen- 
erated in the active site. The CombiDOCK ap- 
proach that can rapidly analyze very large vir- 
tual databases in a binding site by connecting 
reactants to scaffolds docked in multiple 
orientations is discussed in Section 4.10. A 
genetic algorithm-based method for the com- 
binatorial docking of reactants has been de- 
scribed by Jones et al. (92), with the applica- 
tion of a ligand-docking genetic algorithm to 
screening combinatorial libraries. 

A challenge in the design of small- and me- 
dium-sized focused combinatorial libraries is 
to harness for use in library design the experi- 
ence and knowledge gained in generating 
structure-activity relationships (91b). Screen- 
ing libraries biased for pharmaceutical discov- 
ery are often designed to augment the struc- 
tural diversity of a chemical library. The 
approach used in the LASSOO algorithm (93) 
is based on the identification of compounds 
from a virtual library that are most different 
from those already present in a screening set 
and to a reference set of undesirable com- 
pounds, while being simultaneously most sim- 
ilar to a set of compounds with desirable char- 
acteristics. An illustration of the method using 
bit-string structure descriptors is given. 

Combinatorial library design approaches 
have been discussed (94), with the design of 
library subsets that simultaneously optimize 
the diversity or similarity of a library to a tar- 
get, properties (such as druglikeness) of the 
library members, properties (such as cost or 
availability) of the reactants required to make 
them, and the efficiency for array synthesis. 
They showed that libraries can be designed to 



contain molecules constrained to certain drug- 
like properties with only a small trade-off in 
terms of the maximum possible diversity. 

The design of leadlike combinatorial librar- 
ies is an approach of more recent interest. A 
lower molecular weight starting point is ad- 
vantageous, in that bulk can be added for po- 
tency/selectivity/properties without exceeding 
"rule of 5” parameters for orally absorbed 
drugs; otherwise a more labor-intensive step 
may be needed to identify a smaller active part 
of the hit. The properties required of library 
compounds intended to provide leads suitable 
for further optimization, that may be rather 
different from final optimized leads, has been 
reviewed (95). 

Thus, library design is a complex optimiza- 
tion problem with often competing con- 
straints, including requirements to have com- 
binatorial efficiency and/or several specified 
product properties (both desired and nonde- 
sired). Methods such as genetic algorithms, 
simulated annealing, and Monte Carlo optimi- 
zation have been used, and iterative cyclic ap- 
proaches applied. The next section describes 
the application of these methods within the 
context of library design but the reader should 
note that some of these methods are applicable 
only for the design of diverse libraries. 

4.3 Optimization Approaches 

The most basic product-based selection pro- 
cess used in library design is an order-depen- 
dent analysis of products, selecting a com- 
pound if it exhibits sufficient "diversity" to 
products already selected. This approach was 
used in the Chem-X/ChemDiverse software 
with three- and four-point pharmacophore 
fingerprints. A compound was selected if the 
overlap with the ensemble fingerprint of al- 
ready selected compounds was less than a 
user-defined amount; that is, the molecule 
contains a significant number of pharmacoph- 
ores not already exhibited in selected com- 
pounds. This cherry-picking process is an effi- 
cient method for ensuring a high diversity 
library, but can be a combinatorially ineffi- 
cient selection for synthesis, with no explicit 
reference to the constituent reactants being 
made (see Section 4.2 above for further exam- 
ples). A preferred selection for combinatorial 
efficiency is arrays of reactants, in which all 
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reactants from one component of a combina- 
torial library are reacted with aU the reactants 
in the other components, or sparse arrays, in 
which subsets of reactants are combined. Ad- 
ditional constraints such as physicochemical 
properties and flexibility are addressed implic- 
itly by assigning upper and lower bounds for 
given properties, or controlling the order in 
which molecules are processed. 

To address the issue of using pharmaco- 
phore fingerprints in a way that enabled a 
combinatorially efficient selection of reactants 
to be selected, and the explicit inclusion of ad- 
ditional molecular properties such as a bal- 
ance of druglike physicochemical properties 
and shape descriptors, the HARPick program 
(78a, b) was created. A stochastic optimization 
technique [Monte Carlo simulated annealing 
(96)] was used to enable selections in reactant 
space, whereas diversity is still calculated in 
product space. User-defined flexibility for the 
reactant array sizes was possible, and addi- 
tional descriptors could be used (e.g., to ad- 
dress the selection of non-drug-like com- 
pounds). The pharmacophore fingerprint 
(three-point, triplets) was used in a nonbinary 
mode (the frequency of occurrence of each 
pharmacophore was calculated), and the 
HARPick diversity measure was tuned to in- 
clude a term (Conscore) to force molecules to 
occupy relative rather than absolute voids in 
pharmacophore space. This avoids the prob- 
lem of saturation of the fingerprint with large 
databases in a binary mode, particularly a 
problem with the three-point pharmacophore 
descriptors. It was thus possible to design 
combinatorial libraries that exhibited phar- 
macophores that were poorly represented in a 
reference set of compounds. The Conscore 
constraint score sums the product of the num- 
ber of times pharmacophore i has been hit for 
molecules selected from the current data set 
with the score associated with pharmacophore 
i for the constraining library. The Conscore 
term can be inverted, enabling focused de- 
signs, in which the selection of products that 
occupy the more highly occupied bins (e.g., 
from a set of active compounds)is desired. The 
flexibility and success of this kind of stochastic 
optimization methodology has led to its use by 
many other researchers for library design (5c, 
6b, 23d, 78c, 97c,d). Simulated annealing has 



been used to perform reactant selection for 
combinatorial libraries based on three-point 
pharmacophores (78a, b), as described above, 
and other metrics (6b, 23d, 97c,d). 

Genetic algorithms (GA) are another class 
of optimization techniques widely used within 
chemistry (98) that have been explored for li- 
brary design. A GA is an attempt to utilize the 
Darwinian process of evolution in an optimi- 
zation procedure. A solution is represented by 
a string of fixed length, the chromosome, and 
is evaluated according to some criterion to 
give the fitness score, for example, the phar- 
macophore coverage of the solution (78b). The 
GA maintains a number of chromosomes (po- 
tential solutions) that are ranked on their fit- 
ness and are then modified according to oper- 
ators including mutation, where one element 
of the string is changed, and crossover, where 
the string is cut at some position and swapped 
with equivalent portions of another solution. 
These new solutions are evaluated and the 
process is repeated for a defined number cf 
iterations or until all (or most) solutions con- 
verge on one result. For library design, the 
string represents the selected monomers at 
each variable position of the library. Evalua- 
tion involves enumerating the sublibrary de- 
fined by the solution and calculating the score 
associated with the products. The stoch^tic 
nature of the process means that the GA is run 
several times to ensure good convergence. 

A GA was used by Sheridan and Kearsley 

(99) to design peptoid libraries focused to cho- 
lecystokinin by scoring on similarity to two 
peptide leads. Biological activity, rather than a 
computed fitness, has been used as the score in 
a directed combinatorial synthesis program 

(100) . Brown and Martin developed GA- 
LOPED (lOl)asa way to design combinatorial 
mixtures. The SELECT program (78c) com- 
bines measures of diversity and the physical 
properties of the designed library. The library 
can be designed to be both internally diverse 
and diverse with respect to a reference popu- 
lation. Physical properties are optimized by 
comparing to a user-defined profile for the 
property of interest, c logP for example. As for 
the HARPick approach (78a, b), however, it is 
necessary to define a weighting scheme be- 
tween the different elements of the score, 
which leads to a number of difficulties. Selec- 
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(a) 





Figure 5.14. (a) Results from multiple 
SELECT runs with alternative weightings 
for molecular weight vs. diversity. Eilled tri- 
angles, l.OxDivand l.OxMW; filled circles, 
l.OxDiv and O.SxMW; filled squares, 
lO.OxDiv and l.OxMW. (b) As in a, with 
results of a single MOGA run shown as 
crosses. [Reproduced from V. Gillet, et al., 
J. Chem. Inf. Comput. Set, 42, 375-385 
(2002) with permission of the American 

Chemical Society.] 



tion of the weijhts is nonintuitive when com- 
paring different properties or concepts (e.g., 
diversity and c log P) and the use of weights 
oens trains the search space. Figure 5.14a il- 
lustrates how changing the function weight in 
a SELECT run alters the final solution; note 
also that the two objectives, molecular weight 
and diversity, are competing in this example. 
Given these limitations, a novel modification 
has been made to the SELECT methodology. 
The GA in SELECT is replaced by a multiob- 
jecti’\e GA (MOGA) (102) that eliminates the 
need for a weighting scheme. Instead, each el- 
ement of the scoring function is optimized in- 
dependently and solutions scored according to 
the idea of dominance (see Eig. 5.15). The so- 
lutions of rank 0, the nondominated solutions, 
are those solutions for which there is no supe- 
rior solution when considering all objectives; 
solutions cf rank 1 would be dominated in one 
objective and so on. It is these ranks that are 
usedl to describe the fitness of the solution. 
Solutions cf rank 0 are said to define the pa- 



reto surface, as displayed in Eig. 5.14b, which 
overlays the MOGA results onto the SELECT 
results. Thus, the MOGA has several advan- 
tages over a traditional GA. In one run it gen- 
erates multiple solutions that are equally 
valid, more fully explore solution space, and 
gives the designer an understanding of the re- 
lationships between the different objective 
functions. 

The RBPB algorithm of Pearlman and 
Smith, described above (Section 4.2), consid- 
ers all possible candidate libraries, which sat- 
isfy the user's constraints regarding economy. 
These include min/max range constraints re- 
garding library size (e.g., number of AB prod- 
ucts) and the number of each type of reactant 
(e.g., number of A's and number of B’s). These 
also include specification of the minimal unit 
dimensions (MUDs), which define the small- 
est combinatorial array that the user is willing 
to address on the robotic table. Each candidate 
library corresponds to a different way of, at 
least conceptually, arranging the required 
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Figure 5.15. Pareto optimality. The fil led eireles 
represent rank zero or nondominated solutions for 
funetions fl and /2. Point C is rank 1 beeause it is 
dominated by point B. (Permission as in Fig. 5.14.) 



number of MUDs to construct a library within 
the user's specified range limits. Each candi- 
date library is scored based on an appropriate 
function of the scores of the individual prod- 
ucts it contains divided by its size; hence, an 
average product score. The reactants used to 
make each candidate library are determined 
by reactant scores, which are functions of 
product scores and which are updated at each 
step of the design of that particular candidate 
library. For example, at a given step in the 
design process, the reactant score for reactant 

depends on the scores of the products actu- 
ally accessible, given the current choices of B- 
type reactants. The score also depends on the 
scores of the products that could be made us- 
ing B-type reactants, which may be selected at 
a subsequent step in the process. The candi- 
date libraries with the highest scores are out- 
put for the user's final decision. In addition to 
be being remarkably thorough yet fast, the 
RBPB also makes it very easy to address the 
iterative nature of library design and to sug- 
gest replacements for previously suggested re- 
actants that had to be rejected for one reason 
or another. 

A rapid computational method for lead evo- 
lution has been described by the CombiChem 
(now DeltaGen) group (39). Their 3D compu- 
tational approach for lead evolution is based 
on a pharmacophore fingerprint approach us- 
ing multiple pharmacophore hypotheses. A set 
or ensemble of hypotheses is generated that is 



most able to discriminate between active and 
inactive molecules. The ensemble comes from 
an analysis of a large number of pharma- 
cophore hypotheses, with full conformational 
sampling for both active and inactive com- 
pounds. The ensemble hypothesis is used to 
rapidly search virtual chemical libraries to 
identify compounds for synthesis. Large vir- 
tual libraries (e.g., a million structures) can be 
analyzed efficiently. The method was applied 
to a,-adrenergic receptor ligands, where het- 
erocychc a,-adrenergic receptor ligand leads 
were evolved to highly dissimilar active N-sub- 
stituted glycine structures. 

LiBrain (103) is a collection of software 
modules for automated combinatorial library 
design, including the incorporation of desir- 
able pharmacophoric features and the optimi- 
zation of the diversity of designed libraries. A 
Chemistry Simulation Engine module is 
trained by chemists to determine the suitabil- 
ity of reactants for a specified reaction, to rec- 
ognize the risk of undesirable side reactions, 
and to predict the structures of the most hkely 
reaction products, so as to circumvent major 
bottlenecks associated with automating the 
process. 

Legion and Selector (66c, d, 104) are soft- 
ware from Tripos (13c) for characterization, 
comparison, and sampling of sets of com- 
pounds, including a combinatorial builder 
(104), with available descriptors including fin- 
gerprints and atom-pair distances. Clustering 
tools (Hierarchical, Jarvis-Patrick, and Recip- 
rocal Nearest Neighbor) and compound selec- 
tion and diversity comparison methods avail- 
able include Tanimoto Dissimilarity, the 
Reciprocal Nearest Neighbor approach, and 
the OptiSim algorithm (see Section 2.2.3). 

4.4 Handling Large Virtual Libraries 

The rate-limiting step in a product-based h- 
brary design process is often the calculation cf 
molecular descriptors. This becomes particu- 
larly acute as one moves into the 3D arena, of 
course, but even the simplest 2D descriptors 
take a finite time to calculate. In addition, 
there are the logistics associated with storing 
virtual libraries of potentially tens of mill ions 
of compounds. The ability to search within the 
possible chemical space of a particular chem- 
istry, as opposed to the limited space of syn- 
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thesized compounds, is an important compo- 
nent cf lead identification because this allows 
a weak hit from primary screening to be rap- 
idfy expanded into a more potent lead. Exist- 
■ ing chemical database systems can be used or 
readily modified to benefit from the combina- 
torial nature of libraries (64a) but they do not 
ovetccme the fundamental issues. 

Downs and Beirnard (105a) have proposed 
an elegant solution to these problems using 
the Markush representation commonly used 
in chemical patents. The key component of 
their approach is that descriptor calculation 
and diversity analj^is can be performed with- 
out the need for full enumeration of the prod- 
I ucts. In other words, both storage and calcula- 
I tion will tend to scale as the sum of the 
number cf building blocks in the library 
rather than the product as in techniques re- 
quiring enumeration. The method has been 
Ideveloped into a software suite and released 
commercially as the LibEngine module of the 
flerius^ suite for combinatorial library analy- 
l;Sis and design (105d). 

It,: The background and theory behind the ap- 
Iproadi have been published (105b). In sum- 
mary, the algorithm relies on identifying a 
l^re and associated R-groups that define the 
pffary. This may or may not be directly re- 
Ji^ted to the manner of synthesis. For example, 
piagine a tripeptide library synthesized from 
PX20 X 20 amino acids. The algorithm de- 
the tripeptide backbone as the core and 
|be amino acid side chains as the R-groups. 
‘artiai substructure fingerprints are calcu- 
d on a fragment basis representing the 
and R-groups taking full account of the 
chment to the core and the possibility that 
particular path may extend between two R- 
ups. The partial fingerprints are then com- 
‘!?ed into the full fingerprint, a relatively fast 
rrcise. The approach is a couple of orders of 
hides faster than calculating finger- 
ints Irom fully enumerated products. Addi- 
molecular properties such as molecular 
ight, hydrogen-bond donor and acceptor 
ts, and log P can be calculated in a similar 
er as well as topological indices. Finger- 
"“ts cr property data can also be calculated 
demand for use with clustering algorithms, 
avoiding the overhead of storing and re- 



trieving them. There is also interest in extend- 
ing the approach to 3D property calculation 
(105c). 

An alternative approach has been taken by 
Agrafiotis and colleagues. In a conference pre- 
sentation (106) they show how a neural net- 
work can be trained on a small sample of enu- 
merated combinatorial products to reproduce 
2D molecular descriptors and properties for all 
library members without the need to con- 
struct their connection tables. 

A method for rapid similarity searching in 
large combinatorial spaces using a new algo- 
rithm Ftrees-FS was published by Rarey and 
Stahl (135). The similarity search is based on 
the feature tree similarity measure represent- 
ing molecules by tree structures. Combinato- 
rial chemistry spaces are handled as a whole 
rather than looking at subsets of enumerated 
compounds. A set of 17,000 fragments of known 
drugs was used, which could be combined to 
10^® compounds of reasonable size. A novel 
ChemSpace approach (45a) for searching large 
virtual libraries that does not require enumer- 
ation has also been developed by Tripos, using 
shape descriptors (topomeric fingerprints) on 
the monomers, and has been used for targeted 
library design (45b). 

4.5 Library Comparisons 

In the previous sections we described the de- 
sign of libraries based on a number of user- 
defined criteria, whether they were focused or 
whether they were of a more general nature. 
So far, these designs have been undertaken, 
treating the library in isolation, with the in- 
clusion of property profiles in methods such as 
HARPick and SELECT to ensure that the syn- 
thesized compounds are of a suitable physical 
nature. In this sense, the designed library can 
be said to be internally diverse; that is, the 
selected compounds are diverse within the 
limited chemistry space of all virtual products. 
Even for very large virtual libraries, the chem- 
istry space is still small with respect to the 
possible chemistry universe. It is very diffi- 
cult, a priori, to address how "diverse" a de- 
signed library is compared to a library gener- 
ated with another set of reactions without 
having to go through the computationally ex- 
pensive process of computing all pairwise sim- 
ilarities between members of the libraries. 
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Nevertheless, questions such as "How diverse 
is the library compared to the screening collec- 
tion?" or "Which of the following chemistries 
should I choose for a library?" are often posed 
and methods are required to answer them. 

Distance-based methods such as clustering 
can be and have been used but suffer from a 
number of drawbacks both in terms of speed 
and the fact that the exercise needs to be re- 
peated for every additional library (i.e., there 
is no common frame of reference). In addition, 
all pairwise comparisons would need to be per- 
formed. Thus, Shemetulskis et al. (107) used 
clustering methods to compare the Parke- 
Davis corporate collection (117,000 com- 
pounds) with external compounds from 
Chemical Abstracts Service (380,000) and 
Maybridge (42,000). Even today, clustering 
half a million compounds is a daunting task 
and interpreting the results is not straightfor- 
ward. The Jarvis-Patrick method employed by 
Shemetulskis et al. has several input parame- 
ters, including the need to predefine the num- 
ber of clusters. Voigt et al. (108) compared the 
National Cancer Institute (NCI) database, a 
publicly available database of compounds used 
in the NCI screening program, to a number of 
compound databases. The diversity of each 
collection was estimated by the number of 
compounds selected by use of a diversity-selec- 
tion algorithm as a function of database size. 
The similarity overlap between two databases 
has been determined by calculating the per- 
centage of compounds of the first database for 
which a compound exists in the second data- 
base with a similarity greater or equal to a 
specified cutoff (109). Such an approach neces- 
sitates the calculation of the Tanimoto simi- 
larity coefficient of all compounds in a data- 
base with all compounds in the other 
databases. As indicated before, the largest 
drawback of distance-based methods is that 
they give no indication of where the voids are 
within the chemistry space, and searching an 
additional compound source for interesting 
compounds would require reclustering. 

Therefore, partition/cell-based methods 
are preferred for such library comparison 
tasks. They provide a common frame of refer- 
ence in which it is possible to identify voids 
within the chemistry space of a population. It 
must be emphasized that the chemistry space 



is still defined with respect to a reference pop- 
ulation. By comparing the libraries with refer- 
ence to a population (REFDB),such as a cor- 
porate database or a combination of known 
drug databases, one can make statements 
such as, library A shows the greatest overlap 
with REEDB, whereas library B fills the great- 
est number of empty or low occupancy cells. 
Cummins et al. (22) used a cell-based ap- 
proach to compare five databases, including 
the Wellcome Registry, to select screening sets 
of diverse compounds. Topological indices and 
a measure of free energy of solvation were 
taken as the descriptors and factor analysis 
was used to combine them and define a four- 
dimensional chemistry space that was then 
partitioned. Outliers were removed to allow 
the partitioning to focus on the most densely 
populated region. The use of pharmacophore 
descriptors in such a task was illustrated by 
Mason and Pickett (4), where the pharma- 
cophore overlap between three libraries was 
calculated. It was possible to identify the li- 
brary covering regions of pharmacophore 
space not covered by the other two. Alterna- 
tively, given that library A is synthesized and 
gives hits in screening, then presumably the 
library that overlaps best with A should be 
made. Pearlman and Smith (lid) have 
adapted their DVS software to identify what 
they term a receptor-relevant subspace, where 
the BCUT metrics are selected to best group 
the active compounds within a population (in 
fact, it is possible to have several groupings cf 
actives within the space) (see Section 2.2. 1.1). 

Comparing two populations by pharma- 
cophore coverage, although straightforward, 
does ignore the contribution from individual 
compounds. This is important, in that two li- 
braries could cover similar regions of pharma- 
cophore space but individual compounds in 
the two libraries could be displaying different 
subsets of the total pharmacophores covered. 
This prompted Pickett et al. (70) to explore an 
alternative approach. In this case, a number cf 
potential scaffolds were available and the aim 
was to find which of these would best comple- 
ment previously synthesized libraries. Virtual 
libraries were generated using a predefined 
set of reactants and pharmacophore finger- 
prints were calculated for these and the previ- 
ously synthesized libraries. By use of mea- 
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sures proposed by Turner et al. (69b), the 
virtual libraries were compared to the synthe- 
sized libraries at both a whole library level and 
an individual molecule level. From this analy- 
sis it was possible to select the scaffold that 
best complemented the previously synthe- 
sized libraries. 

An alternative methodology based on the 
ringcontent of a database, using precalculated 
structure-based hashcodes has been proposed 
(110). The comparison of the hashcode tables 
can be used to compare two databases and the 
number of distinct ring-system combinations 
can be used as an indicator of database diver- 
sity. A method for diversity assessment called 
the saturation diversity approach, based on 
picking as many mutually dissimilar com- 
pounds as possible from a database was also 
proposed. The methods were used to compare 
a number of public databases and gave similar 
results. 

4.6 Pharmacophore-Based Fingerprints 

The examples of GPCR library design (de- 
scribed in Section 5.1.2) and protein-site de- 
sign for Factor Xa (described in Section 5.3) 
illustrate the use and relevance of pharma- 
cophore-based fingerprints in library design. 
A pharmacophoricbias has been a major com- 
ponent of many library designs (111), used in 
the context of focused or biased libraries. 
Their broad applicability is important, with 
the same descriptors being used for diverse 
library design, screening set selection, and fo- 
cused library design. This provides a consis- 
tent approach that extends to protein-site 
based pharmacophores as discussed above. 
Their ability to determine the similarities and 
differences between structurally diverse mol- 
ecules and sites is very powerful. An ensemble 
pharmacophore data set measure is often 
used, which attempts to condense the individ- 
ual molecule pharmacophore fingerprints into 
a single measure that describes the important 
features of the data set as a whole (36, 37, 
78a,b). 

McGregor et al. (112) have recently pub- 
lished a version of pharmacophore finger- 
printing (the PharmPrint method) applied to 
QSAR and focused library design that uses a 
limited basis set of 10,549 three-point phar- 
macophores. They included the usual six phar- 



macophoric features, plus an additional defi- 
nition of other for all remaining unassigned 
atoms. A subset of the MDDR database (13a) 
was used to define a reference set of bioactive 
molecules, separated into target classes (gene 
families). The discriminating power of several 
molecular descriptors was measured using the 
target class assignments for this set, and it 
was found that the pharmacophore finger- 
print outperformed other descriptors. 

4.7 Combined Pharmacophore Fingerprints 
and BCUTs 

Library design using a simultaneous optimiza- 
tion of BCUT chemistry-space descriptors (11) 
and four-point pharmacophore fingerprints 
has been reported (32d, 37d). The authors in- 
vestigated the feasibility and results in terms 
of complementarity of simultaneously opti- 
mizing two product-based descriptors for reac- 
tant selection from large virtual libraries. Di- 
versity around a chosen chemistry was the 
goal of the studies reported, but the approach 
could equally be applied to optimize to a de- 
sired distribution of properties, say, from sets 
of biologically active compounds. A simulated 
annealing algorithm (97) was used to combine 
both components in a single optimization pro- 
cedure. The choice was based on the ease of 
implementation and the ability to include 
multiple components in the objective (23d), an 
important goal in many recent designs, if only 
to modulate physicochemical properties to 
druglike ranges. In this example a small, fully 
enumerated virtual library of 86,140 amide 
compounds was constructed from carboxylic 
acids and primary amines present in the ACD 
(Available Chemicals Directory). The prod- 
ucts of the optimized and random starting re- 
actant sets were compared using average 
nearest-neighbor distances, and the Hopkins^ 
statistic (113), which evaluates the degree of 
clustering in a data set, together with the four- 
point pharmacophore fingerprint diversity. 
The potential utility for very large virtual li- 
braries, where precalculation of all the phar- 
macophore fingerprints would not be feasible, 
was illustrated by calculating four-point phar- 
macophore fingerprints for virtual library 
compounds on the fly. The fingerprints were 
calculated during the optimization procedure 
and stored in a compact encoded form, with 
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previously calculated fingerprints reused as 
needed (calculation times ”1-5 s with confor- 
mational sampling per structure on an SGI 
RIOOOO machine). Diversity was evaluated for 
the BCUT chemistry space using the ratio of 
filled to possible filled cells for the virtual li- 
brary. Four-point pharmacophore diversity 
was evaluated by the number of unique phar- 
macophores and the total number of pharma- 
cophores in the product subset, with the goal 
to optimize both the pharmacophoric unique- 
ness of each compound selected and the total 
number of pharmacophores exhibited. En- 
couraging results were obtained, with addi- 
tional work necessary to develop a more gen- 
eral function. 

4.8 Oriented-Substituent Pharmacophores 

OSPPREYS is a pharmacophore diversity de- 
scriptor developed specifically for combinato- 
rial library design by Martin and Hoeffel (43). 
Advantage is made of the common scaffold, so 
calculations are performed on the sets of sub- 
stituents. This enables a more detailed phar- 
macophoric description of the library products 
than through calculations that could be prac- 
tically performed on the enumerated prod- 
ucts. By avoiding the problems of having to 
analyze many products with many conforma- 
tions per product, and an explicit dependency 
on the scaffold, a higher spatial resolution 
could be obtained. The analysis of enumerated 
combinatorial libraries by pharmacophoric 
methods is generally limited to smaller virtual 
libraries, with three- or four-point pharma- 
cophores, and limited conformational sam- 
pling, requiring new calculations for every li- 
brary. The oriented-substituent pharmaco- 
phores (OSPs) were developed as a compro- 
mise approach between reactant and product- 
based methods to rectify these limitations. To 
recapture most of the orienting information 
that is lost in fragmenting the enumerated 
products into substituents, two additional 
points are added to each ordinary one-, two-, 
and three-point substituent pharmacophore, 
necessitating approximations through the 
combinatorial conformer and the template 
alignment assumptions. The OSPPREYS 
analysis does, however, account for up to nine- 
point pharmacophore similarity in the prod- 
ucts of a library with three diversity sites. In 



addition, the consideration of relatively rigid 
substituents reduces the number of structures 
to analyze by up to 10^*^ compared to that of a 
full product-based analysis. This permits a 
thorough conformational sampling of very 
large virtual libraries that would be too slow 
on enumerated structures. A Euclidean prop- 
erty space for diversity analysis is possible be- 
cause of the small number of pairwise sub- 
stituent similarities, enabling options not 
possible by counting set bits in a library union 
fingerprint. The database of oriented substitu- 
ent fingerprints is transferable between li- 
braries, within the restrictions of the noted 
approximations. A major limitation in using 
OSPPREYS is that it can be applied only 
within a combinatorial library, and not be- 
tween libraries. OSPPREYS is well suited to 
maximizing the diversity of scaffolds indepen- 
dently, and can be used to build a screening file 
based on such diversity. 

4.9 Integration 

The previous sections have outlined the basic 
methodology that has been developed in the 
areas of molecular diversity, similarity analy- 
sis, and library design. Traditionally, use cf 
these methods was limited to a small number 
of exponents within a computational chemis- 
try group because it involved bringing to- 
gether a diverse set of software tools and data 
sources. Combinatorial and high throughput 
chemistry is now well integrated into the re- 
search process and there is a need for bench 
chemists to have access to such tools. The 
Cousin system developed at Upjohn has been 
in use since 1981 (114) and has recently 
evolved to the ChemList system. The system 
includes tools for the browsing of dissimilar 
compounds from substructure searching, use- 
ful in reactant selection, for example. Gobbi et 
al. (115) have described the development of 
the CICEOPS system in use at Novartis. The 
system provides functionality for designing 
and registering libraries and associated tasks 
such as accessing reactant availability. Tools 
are provided for filtering reactant lists and se- 
lecting a diverse subset of reactants if re- 
quired. The system is PC-based and is built 
around the Daylight chemical information 
system (13b) and associated tool kits with cus- 
tom Windows clients to control the process. 
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Figure 5.16. The workflow used within ADEPT (A Daylight Enumeration and Profiling Tool; 
GlaxoWellcome, UK) for compound selection and library design. [Reproduced from A. R. Leach and 
M. M. Hann, Drug Discovery Today, 326-336 (2000), with permission cf Elsevier Science.] 



The ADEPT (A Daylight Enumeration and 
Profiling Tool) suite of programs developed at 
GlaxoWellcome (116) is a Web-based system 
providing access to a wide range of library de- 
sign functionality, again based around the 
Daylight tool kit. Eigure 5.16 provides an out- 
line of the process workflow. Reactant lists are 
generated from searches in databases of in- 
house and commercially available monomers. 
Avariety of filters can be applied to reduce the 
size of the lists. These include filters on molec- 
ular weight, rotatable bond count, and sub- 
structure filters to remove unwanted func- 
tionality. After library enumeration, various 
property histograms are calculated. This al- 
lows the user to further refine the reactant 
choice. 

A product-based library design algorithm, 
PLUMS (117), has been developed to ensure 
that combinatorial constraints are satisfied in 
the design. The algorithm successively re- 
moves the monomer that adds least value to 
the library as governed by two terms, the ef- 
fectiveness (number of molecules meeting 
user-defined criteria such as property ranges, 
fit to pharmacophore or dock to protein site) 
and efficiency (ratio of effectiveness to library 
size). The algorithm is sufficiently fast to 
work within the Web-based environment of 
ADEPT. Eigure 5.17 shows screen shots from 
ADEPT, illustrating how a library can be spec- 
ified and the resulting product histograms. A 



similar system has been implemented at Ver- 
tex (118a). A key component of this system is 
the REOS filtering tool (118b), which applies 
filters on molecular weight, lipophilicity, un- 
wanted substructures, rotatable bond counts, 
and so forth to remove "obviously bad" com- 
pounds. 

4.1 0 Structure-Based Library Design 

Structure-based library design uses 3D struc- 
tures of the biological targets to direct the de- 
sign and selection of templates/scafifolds and 
of reactants that will produce compounds that 
can fit into the target and thus are likely to 
bind and have biological activity. The experi- 
mental structural information can be derived 
by a structural biology approach, using X-ray 
crystallography or NMR spectroscopy. Com- 
putational models can be built and used (e.g., 
homology modeling techniques for closely re- 
lated proteins), but an experimental structure 
is always preferred. A structural biology ap- 
proach can also be used to identify molecules 
or fragments thereof that bind to a target. Eor 
example, NMR screening (3) can be used to 
identify potential scaffolds or reactants for a 
combinatorial library that bind to a target site 
and is able to detect very low affinity binding 
(in the millimolar range, compared to the low 
micromolar range from biological screening); 
this can be done without the need to deter- 
mine the 3D structure of the target. 
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Structure-based drug design (SBDD)is the 
topic of another chapter, and key issues such 
as the scoring functions for the ligand-recep- 
tor interaction are not discussed further here. 
The ability to combine SBDD with combinato- 
rial chemistry enables a focused design ap- 
proach that can explore a range of ideas, re- 
ducing the dependency on SBDD limitations 
(structural information, scoring, conforma- 
tional sampling, etc.). The ability to obtain the 
X-ray or NMR structure cf new potent mole- 
cules complexed with their targets can also be 
critical for the next iteration, in that compu- 
tational structure-based design methods may 
be unable to predict alternative and new bind- 
ing modes, especially because the protein site 
is normally kept rigid and unpredicted confor- 
mational changes can take place during the 
binding process. A review by Stahl (119) dis- 
cusses the technology that directly uses recep- 
tor three-dimensional structures, discussing 
relevant topics such as scoring functions, re- 
ceptor-ligand docking, and practical applica- 
tions. Bohm and Stahl (120) have reviewed 
structure-based library design in terms of mo- 
lecular modeling merging with combinatorial 
chemistry. 

The synergy between combinatorial chem- 
istry and de novo design has been discussed by 
Leach et al. (121). They present an approach 
wherein a template (corresponding to thec^w- 
tral core of a combinatorial library) is posi- 
tioned within an acyclic carbon chain whose 
length and bond orders are systematically var- 
ied. The conformational space of each result- 
ing structure (core plus chain) is explored, to 
determine whether it is able to link together 
two or more strongly interacting functional 
groups or pharmacophores located within a 
protein binding site. In a second phase, 2D 
queries are derived from the molecular skele- 
tons and used to identify possible reactants 
from a database that would enable the all-car- 
bon linking chains to be replaced by more syn- 
thetically feasible groups. 

Sheridan et al. (122) have published on de- 
signing targeted libraries with genetic algo- 
rithms, extending earlier work, to use the GA 
with 3D scoring methods and showing that the 
approach of assembling libraries from frag- 
ments in high scoring molecules is a reason- 
able one. Example applications to two situa- 



tions are described: (i ) where the 2D structure 
of some actives (diverse angiotensin II antag- 
onists) is known, with the goal to design a li- 
brary that best resembles the actives; and (2) 
to simulate the situation where an active site 
(stromelysin-1 in this case) is available and 
the requirement is to design a library of struc- 
tures likely to bind to it. 

Tondi (123) discusses several examples in 
which structure-based drug design and combi- 
natorial library synthesis have worked suc- 
cessfully together in a complementary way. 
These include the discovery of: 

^ Potent nonpeptide inhibitors of cathepsin D 
(124), which uses CombiBUILD (125), a de- 
rivative of the DOCK (126a, b) approach, 
with this structure-based selection ap- 
proach yielding seven times as many hits as 
a diversity-based procedure. 

^ Thrombin inhibitors (127), where Bohm et 
al. used LUDI to dock and score computa- 
tionally available primary amines and then 
score the virtual library generated from 
benzaldehydes with the top-scoring hit. 

^ Novel inhibitors of matrix metalloprotein- 
ases (128): Rockwell et al. (128a) used a com- 
binatorial library at the beginning of the 
work to suggest leads suitable for further 
optimization that required a conformational 
change at the binding site, and a structure of 
the complex to enable iterative optimiza- 
tion; Szardenings et al. (128b) used SBDD to 
design the starting scaffold, with synthesis 
guiding the introduction of diversity. 

^ Thymidylate synthase inhibitors (129), us- 
ing DOCK to identify the starting lead. 

The CombiDOCK program (126c), based on 
DOCK, enables the evaluation of very large 
virtual libraries by using structure-based com- 
binatorial docking. Multiple docked orienta- 
tions of the scaffold are used to evaluate reac- 
tants separately at each of the substitution 
positions. The total docking score for each 
product is rapidly estimated by summing the 
contributions from reactants at each position 
(which are attached as in the final product to 
the docked scaffold, which may be a computa- 
tionally convenient anchor fragment formed 
during the reaction rather than a syntheti- 




228 



Combinatorial Library Design, Molecular Similarity, and Diversity Applications 



cally used chemical). Further checks are made 
for the highest scoring structures (e.g., for 
steric interactions between reactants at the 
different substitution positions). This approx- 
imation produces an enormous speed-up over 
docking all the individual compounds, which, 
from a time perspective, rapidly becomes pro- 
hibitive for large combinatorial libraries. 
From the scores it is possible to select combi- 
nations of reactants that produce compounds 
complementary to the protein binding site. 
Combinatorial restraints can be applied as re- 
quired to obtain the most efficient use of reac- 
tants and robotics, with an evaluation of any 
reduction in the inclusion of higher scoring 
compounds. 

Different strategies for combining diversity 
and structure-based design in site-focused li- 
braries and the DOCK-based CombiBUILD al- 
gorithm are discussed in a review (125), as an 
example of how lead compounds can be rapidly 
identified by combining diversity with struc- 
ture-based design in site-focused libraries. 

Lamb et al. (130) have published on the 
design, docking, and evaluation of multiple li- 
braries against a family of targets, using a sim- 
ilar divide-and-conquer algorithm for side 
chain selection that enables the exploration of 
large lists of reactant substituents with linear 
rather than combinatorial time dependency. 
The method consists of three main stages: (1) 
docking the scaffold, ( 2 Jselecting the best sub- 
stituents at each site of diversity, and (3) com- 
paring the resultant structures within and be- 
tween the libraries. The scaffold docking 
procedure, in conjunction with a novel vector- 
based orientation filter, was shown to be effec- 
tive for several protease targets, reproducing 
experimental binding modes. 

The application of the powerful combina- 
tion of SBDD and combinatorial chemistry is 
not limited to lead discovery or the optimiza- 
tion of potency, but also to the optimization of 
the selectivity (using knowledge of the struc- 
tures of related targets) and pharmacokineticl 
druglike properties of a molecule. For exam- 
ple, the structure of a ligand-receptor complex 
can clearly indicate areas where chemical 
modifications could be made to modulate these 
other properties, without directly affecting 
binding/potency. Models/structures of ligands 



with the cytochrome P450 metabolizing en- 
zymes are also now becoming available. 



5 EXAMPLE APPROACHES 

5.1 General Target Class-Focused 
Approaches 

5.1 .1 Defining the Chemical/Biological Space. 

The design of target class (gene family) librar- 
ies or compound subsets requires the defini- 
tion of a biologically relevant chemical space. 
This "biological" space can then be used for 
the design and selection of biased/focused h- 
braries and compound subsets. Many ap- 
proaches can be taken, adapting the use of a 
wide variety of similarity/diversity descriptors 
(discussed in Section 2.1) to the identification 
of properties associated with a particular tar- 
get class or subset thereof. The goal is to iden- 
tify a feature or set of features that, ideally, is 
specific, but more generally "enriched" for the 
target(s) of interest. A common approach is to 
identify chemical substructures that are char- 
acteristic for the target class, and use these for 
the design. The simplest approach is to include 
such substructures in the library, but the co- 
occurrence of other features is often needed, 
and the quantification of this provides an en- 
hanced design. An example of this combined 
approach is discussed in the next section, us- 
ing the pharmacophore fingerprints expressed 
relative to "privileged" substructures. This 
provides a convenient cell-based partitioning 
approach. Alternatively, it is possible to iden- 
tify properties that are enriched for a particu- 
lar target class, without reference to any 
particular substructures: ID (e.g., physico- 
chemical), 2D (e.g., ISIS keys, BCUTs),and 3D 
(e.g., pharmacophore fingerprints) properties 
can all be used. BCUTs have been used within 
a target (to identify a receptor-relevant sub- 
space, in which actives cluster), to differenti- 
ate within a target class (e.g., ion channel 
openers vs. blockers) and for general target 
class analysis. BCUT chemical space provides 
a way to quantify the "diversity" of certain 
properties within actives for a target class, as 
well as to identify any particular combination 
of properties that actives share. BCUTs have 




5 Example Approaches 



229 



been used to select representative subsets 
from libraries biased to a target class (59a). 

5.1.2 7-Transmembrane C-Protein-Coupled 
Receptors. Examples of a product-based com- 
binatorial library design that use four-point 
pharmacophore fingerprints in a "relative" di- 
versity mode have been described for the de- 
sign of combinatorial libraries that contain 
"privileged" substructures focused on 7-TM 
GPCRs. These are a large family of very im- 
portant biological targets lacking high resolu- 
tion experimental 3D structures of the human 
targets; therefore most design has focused 
around the ligands. The occurrence of com- 
mon "privileged" substructures for 7-TM 
GPCRs, often spanning several targets, pro- 
vides a useful focused design method. Some 
example structures are shown in Fig. 5.18. 

A useful modification was made to the stan- 
dard pharmacophore descriptor evaluation 
(37, 80) by forcing one of the points in the 
pharmacophoric description to be aprivileged 
substructure. This provides a novel quantifi- 
cation of all the 3D pharmacophoric shapes, 
and thus important 3D information relevant 
to the biological activity of the ligands, relative 
to the substructure. This builds on the ability 
cf 3D pharmacophoric descriptors to separate 
chemical structure from ligand binding prop- 
erties, and enables a fingerprint to be gener- 
ated that describes the possible pharmaco- 
phoric shapes from the viewpoint of that 
special point/substructure (see Fig. 5.19). A 
relative or internally referenced measure of di- 
versity is thus created, enabling new design 
and analysis methods (see Section 2.2.5). The 
goal of the published method was to design 
novel structures, accessible through combina- 
torial chemistry, that have one or more privi- 
leged substructure reactants/cores, and are 
enriched in the relative 3D pharmacophoric 
shapes of known ligands. The method identi- 
fies patterns with other key features that need 
to be present with the privileged substructure, 
such as acids and bases. The optimization can 
also include an enrichment in pharmacophoric 
shapes containing the privileged substructure 
that are not in existing structures, enabling 
the exploration of new 3D pharmacophoric di- 
versity focused around a feature known to be 
important for biological activity. 



The Ugi reaction (131), a four-component 
condensation reaction, was chosen and more 
than 100,000 compounds were synthesized. 
Privileged substructures such as biphenyl tet- 
razole were used, for example, at the amine 
position (see Fig. 5.20). Other GPCR privi- 
leged substructures such as diphenyl meth- 
ane, biphenyl tetrazole, and indole were used 
to focus the pharmacophore descriptors (see 
Figs. 5.18 and 5.21). GPCR ligands reported to 
be active at receptors with peptidic endoge- 
nous ligands were identified from the MDDR 
(13a). These compounds were used to provide 
the reference data for the design by calculat- 
ing the union pharmacophore fingerprint of 
compounds containing the privileged sub- 
structure (see Fig. 5.22 for an example struc- 
ture) .A virtual combinatorial library was then 
created, and for a particular reactant position, 
the privileged pharmacophore fingerprints 
were calculated for each candidate reactant 
over all the products that would be generated 
if it were used in the library. Either previously 
selected or a representative set of reactants 
were used for the other three components to 
generate the virtual Ugi products. 

The combinatorial library was then de- 
signed by comparing for each reactant the fin- 
gerprint generated from the resultant prod- 
ucts with the fingerprint for the known drug 
ligands (MDDR-fingerprint). Reactants were 
selected by identifying, on a position-by-posi- 
tion basis, reactants that gave products that 
matched the greatest number of these MDDR- 
exhibited privileged pharmacophores. The de- 
sign goal was recalculated after the selection 
of each reactant, by removing the pharma- 
cophores matched by the products generated 
by that reactant from the target list. Subse- 
quent reactants were thus picked based on 
their ability to match the remaining pharma- 
cophores. The approach used was to select the 
first reactant as the one that would give li- 
brary compounds with the most number of 
privileged pharmacophores in common with 
the drug set. The process was continued until 
no more reactants could be found that contrib- 
uted a nontrivial number of new privileged 
pharmacophores. Optimization methods such 
as the HARPick approach (described in Sec- 
tion 4.3) could be used to enable other proper- 
ties, such as flexibility and physicochemical 




5 Evample Approaches 



231 




H-bond donor 1 Acceptor 
Add I Base 

Aromatic ring I Hydrophobe 



Figure 5.19. Example of a "privileged" four-point pharmacophore. Here biphenyl tetrazole, a sub- 
structure seen in a number of GPCR inhibitors, is specifically defined as a pharmacophore feature, 
using a centroid dummy atom. Only pharmacophores that include this type are included in the 
fingerprint, thus providing a relative measure cf diversity/similarily with respect to the privileged 
feature. 



properties, to be optimized also. The total 
number cf pharmacophores (this time without 
reference to the privileged substructures) can 
also be monitored and optimized. Example re- 
sullls Ifom one of the Ugi library optimizations 
are shown in Fig. 5.23. 

This design illustrates an advantage of a 
partitioning (cell-based) approach. The phar- 
macophore fingerprint can be used to monitor 
progress, to quantify how much of the desired 
goal has been accomplished, and to evaluate 
whether a given chemistry can yield further 
compounds that match the design criteria 
andi/or explore new pharmacophoric space. 



R1COOH + R2NH2 -r R3CHO + R4NC 



MeOH 




Figire 5.20. Example of the Ugi chemistry with 
bipttenyl tetrazole incorporated as a "privileged" 
group at the amine position. 



The example here used only a binary finger- 
print, but even more powerful results can be 
obtained when a count for each potential phar- 
macophore is included. The authors showed 
that for these designed Ugi libraries the same 
Ugi chemistry could indeed yield significant 
new diversity for multiple 14,000 compound 
libraries, but that after three libraries dimin- 
ishing returns were obtained. They used the 
understandable nature of the pharmacophore 
descriptor by analyzing th e remaining MDDR- 
pharmacophore fingerprint to show that most 
of the remaining pharmacophores to be 
matched contained acids and/or bases. A mod- 
ified chemistry approach was therefore devel- 
oped using protected acids (t-butyl esters) and 
bases (BOC protected) in the Ugi reaction. 
The unmatched cells in the MDDR-fingerprint 
can be related back to the compounds that 



N = N 



/ \ 




Biftoiyl tetrazole 

897 compounds 
across 3 MDDR 
activity indexes 







Diphenylmethane 

487 compounds 
across 59 MDDR 
activity indexes 



Figure 5.21. Examples of 7-TM GPCR "privi- 
leged" motifs found in the MDDR database. 
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0 "Privileged feature # Hydrophobic feature 
# Aromatic ring centroid 

Total 4-point pharmacophores: 3601 
-with "privileged" feature: 1569 
(using 1 0 mstance ranges) 

Figure 5.22. Example of pharmacophore feature 
assignments involving the biphenyl tetrazole "priv- 
ileged" substructure and the total four-point poten- 
tial pharmacophores calculated for a GPCR antago- 
nist. Note that just the subset (40%) of the total 
pharmacophores that contained the "privileged" 
substructure was used for the library design. 



generated them, enabling a truly iterative de- 
sign of further libraries. In Fig. 5.24 the in- 
creasing size of the pharmacophore finger- 
print from four consecutive Ugi libraries is 
illustrated, together with the distribution cf 
pharmacophoric features in MDDR pharma- 
cophores that had not been matched. 

Another example for GPCR library design 
is the use of BCUT metrics as the basis for 
target class-focused approaches to accelerated 
drug discovery. A particularly interesting ex- 
ample is work done by Wang and Saunders 
(32e,i) at Neurocrine Bioscience in their effort 
to discover novel nonpeptidic ligands for a par- 
ticular member (GPCR- 1) of the GPCR-PA-h 
family of receptors activated by peptides car- 
rying an obligatory positive charge. They and 
their colleagues performed a thorough search 
of the literature and identified a few hundred 
ligands of the various members of the GPCR- 
PA-h family. Knowing that it is usually not 
useful to foUow up hits or leads showing very 
poor affinity, they eliminated ligemds with less 



Figure 5.23. (a and b) Contribu- 
tions per acid reactant of pharma- 
cophores for optimization in the 
Ugi reaction (with biphenyl tetra- 
zole as the "privileged" motif at 
the amine position). The order 
shown is the final selected order of 
reactants, based on obtaining the 
maximum number of new privi- 
leged pharmacophores per addi- 
tional reactant. Histogram a 
shows the number of new phar- 
macophores added by each new 
selected reactant in the "privi- 
leged" pharmacophoric space de- 
fined by known GPCR compounds 
containing the biphenyl tetrazole; 
shown in histogram b is the 
matching increase in the total 
number of pharmacophores for 
the library for each new selected 
reactant. 
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gure 5.24. On the left is shown the cumulative (black) total number cf four-point pharmaco- 
phores from consecutive 1 4,000 sets of Ugi libraries designed for 7-TM GPCR targets, together with 
3 total number of pharmacophores in each library (in gray). Note the diminishing yield of new 
pharmacophores with later libraries, indicating that a change in strategy is needed. On the right are 
own the features present in the resultant unrepresented pharmacophores (i.e., found in 7-TM 
^CR biphenyl tetrazole-containingcompounds in MDDR but not in synthesized libraries), indicat- 
ing a strategy change to include more acids and bases together with the biphenyl tetrazole. 



00 /xM affinity for the corresponding re- 
Significantly, they also eliminated li- 
with better than 1 fxM affinity for the 
ponding receptor. This very unusual 
/as taken in an effort to convince their 
.’gues that the method they intended to 
as not reliant on knowing the answer 
of time. This left 187 ligands with affin- 
ostly between 1 0 and 70 /xM for various 
iers cf the GPCR-PA-h family of recep- 
ysing these compounds, they perceived a 
dimensional BCUT subspace within 
orporate chemistry space that clusters 
gands of individual members of the 
5l-PA-f family and appears to be appro- 
ijjv'iS for this target class. The positions of all 
gands in the 3D chemistry space shown 
. 5.25 were originally indicated by open 
drcles. All ligands of some but not all 
;ors were then color-coded as indicated, 
led GPCR-2 and yellow GPCR-4 ligands 
dden under the green GPCR-1 ligands, 
ray oval provides a crude indication of 
igion cf chemistry space of interest for 
1-PA+ receptors. 

5.26 indicates the positions of 
]y 2000 Neurocrine compounds selected 
14 different combinatorial libraries 
on 14 different and proprietary scaf- 
Rather than selecting compounds only 
he known ligands of GPCR-1, their re- 
0 f interest, Wang and Saunders also se- 
compounds spanning the entire GPCR- 



PA-h receptors. This was done to further 
convince their colleagues, as explained below. 
AH 2000 compounds were screened for activity 
against the GPCR-1 receptor. Those testing 
positive were retested in a secondary, func- 
tional assay. AH but two compounds having 
better than 100 nM affinity for the GPCR-1 
receptor are colored blue and/or are located 
within the blue oval. AH but one compound 
having better than 10 nM affinity for the 
GPCR-1 receptor are colored red and/or are 
located within the red oval. AH compounds 
with better than 2 nM affinity are colored 
green and are located within the two small 
green ovals within the larger green oval, con- 
sistent with the two crude clusters of GPCR-1 
ligands seen in Fig. 5.25. The fact that these 
two small ovals each contain products from 
several different libraries (scaffolds) suggests 
the possible existence of two binding modes 
for this receptor. It is also significant to note 
that, although the authors intentionally syn- 
thesized compounds within the entire region 
of interest for GPCR-PA-h receptors, the only 
compounds showing significant affinity for the 
GPCR-1 receptor were located close to the 
known GPCR-1 ligands (compare with Fig. 
5.25), thus supporting the use of BCUT coor- 
dinates (on receptor-relevant axes) as a valid 
approach to virtual high throughput screen- 
ing. The tight clustering of GPCR-PA-h li- 
gands in both figures clearly suggests that 
BCUT metrics represent, albeit in a relatively 
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Figure 5.25. The 3D subspace most receptor relevant for members of the GPCR-PA+ family of 
receptors. Points indicate coordinates of 187 published ligands of various GPCR-PA+ receptors. 
Some have been color-coded by receptor for illustrative purposes. See Refs. 32e,i and text for further 
details. See color insert. 



crude fashion, the same sort of information as 
would be represented in a description of the 
pharmacophore for the receptor of interest. 

5.2 Property-Biased Design 

The use of pharmacophoric descriptors in en- 
hancing the hit-to-lead properties of lead opti- 
mization libraries has been described (76). 
Pharmacophore fingerprints, based on the 
Chem-X/ChemDiverse multiple pharmaco- 
phore descriptors, were used and several is- 
sues in the design of lead optimization librar- 
ies were addressed. The applicability of 



pharmacophoric methods to the design of fo- 
cused libraries was demonstrated in this case, 
where the aim was to design the library to- 
ward a known lead or leads. The authors also 
investigated the design of libraries with im- 
proved pharmacokinetic properties. Simple 
and rapidly computable descriptors applicable 
to the prediction of drug transport properties 
were used, and the results illustrate a common 
problem: to obtain the best results it may be 
necessary to synthesize libraries in a noncom- 
binatorial manner. A Monte Carlo search pro- 
cedure was devised to enable the selection of a 
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Figure 5.26. The same 3D subspace as in Fig. 5.25, rotated slightly to provide a better viewing 
perspective. Points indicate coordinates of about 2000 combinatorial products selected from 14 
different libraries. Color-coding indicates affinity for the GPCR- 1 receptor. See Refs. 32e,i and text for 
further details. See color insert. 










ar-combinatorial subset in which all library 
^mbers satisfy the design criteria. By includ- 
gcalculatedlogP, molecular weight, and po- 
<r surface area in the design of a combinato- 
d library, it was shown that the compounds 
th improved absorption characteristics (as 
vtermined by experimental Caco-2 measure- 
mts) could be obtained. 

The use cf computational methods such as 
reactant clustering and library profiling to 
maximize reactant diversity and optimize 
iarmacokinetic parameters has been de- 
scribed (132), with four-point pharmacophore 



fingerprint analysis used to quantify the 
added diversity gained by using two indepen- 
dent synthetic routes. 

5.3 Site-Based Pharmacophores 

Pharmacophore fingerprints generated from 
complementary site points can be used to di- 
rect combinatorial library design and to inves- 
tigate selectivity. An example of the pharma- 
cophore fingerprinting method for selectivity 
studies has been validated (37a, b) in studies of 
three closely related serine proteases: throm- 
bin, Factor Xa, and trypsin. Site points were 
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positioned in the active site of each protein 
using the results of GRID (42) analyses (see 
Fig. 5.5), and receptor-based four-point phar- 
macophore fingerprints were generated. 
Fingerprints were also generated using full 
conformational flexibility for some highly se- 
lective and potent thrombin and Factor Xa in- 
hibitors. Receptor-based similarity was inves- 
tigated as a function of common potential 
three- and four-point pharmacophores for 
each ligand/receptor pair. The results indi- 
cated that the use of just the common poten- 
tial four-point pharmacophores could give in- 
formation pertaining to relative enzyme 
selectivity; when three-point pharmacophores 
were used, however, poor resolution of en- 
zyme selectivity was observed. The thrombin 
inhibitor thus exhibited greater similarity 
with the complementary four-point pharma- 
cophore fingerprints of the thrombin active 
sites than with the potential pharmacophore 
keys generated from the other enzymes; a sim- 
ilar result was found for the Factor Xa inhibi- 
tors with the Factor Xa site. 

Clearly, the inclusion of the shape of the 
binding site should improve the resolution, 
and the DiR (Design in Receptor) approach 
(133) refines the process, requiring that the 
pharmacophoric match fits the shape of the 
target site (i.e., is sterically compatible with 
the site). This clearly provides much addi- 
tional information at the expense of greatly 
increased calculation time. Within the DiR ap- 
proach, two-, three-, and four-point potential 
site pharmacophores can be used. This pro- 
vides interesting new library design possibili- 
ties, in that it is possible to evaluate which 
ligands are able to fit in the site by matching at 
least one set of pharmacophoric features, and 
to quantify which pharmacophore hypotheses 
are matched. A subset of ligands can then be 
designed that match as many different phar- 
macophoric hypotheses as possible, and the bi- 
ological screening of the resultant compounds 
can determine which bind best. Alternatively, 
pharmacophore constraints can be applied to a 
shape-driven searching approach, and Good et 
al. (34) have shown the effectiveness of this 
with the DOCK virtual screening/docking ap- 
proach, in which the addition of pharmaco- 
phore constraints improved both the enrich- 
ment and speed of the process. 



The active site of the Factor Xa serine pro- 
tease (134) has been used for combinatorial 
library design (37c, d) using the DiR approach. 
GRID analyses using probes for hydrogen 
bond donors, acceptors, bases, acids, and hy- 
drophobes resulted in 23 complementary site 
points being added (see Fig. 5.5). The shape of 
the active site was defined using 162 protein 
atoms. To ensure that a relevant area of the 
binding site was being explored (based on 
knowledge of X-ray protein-ligand complexes), 
site pharmacophores were forced to contain a 
hydrophobe or aromatic ring centroid point 
from both the SI and S4 regions of the binding 
pocket. By using this focused approach, a "di- 
versity" of matched site pharmacophores was 
obtained, representing a sampling of "reason- 
able" binding modes related to those experi- 
mentally observed and, thus, presumably hav- 
ing a higher probability of giving rise to 
biological activity. This focused approach re- 
duced the total number of site pharmaco- 
phores from 5393 to 775 [using the seven dis- 
tance ranges setting (37a) and considering all 
distances in the 1-15 A range]. The approach 
was validated by the identification of feasible 
binding models (37c), similar to that experi- 
mentally observed for a known Factor Xa in- 
hibitor. The Ugi four-component condensa- 
tion reaction (131)(seeFig. 5.20) was used fpr 
the study and is capable of producing very 
large numbers of different structures from 
commercially available reactants. An example 
of the power of the method was given, whereby 
products were selected semimanually from a 
small virtual library of 432 products (37c,d). 
Products were constructed from the four reac- 
tant sets: carboxylic acids (R^ X 3), amines (Rg 
X 2), aldehydes (R 3 X 3), and isonitriles (R 4 X 
24). The pharmacophore-based site analysis 
showed the optimum positions of substitution 
and chain length for benzamidine-containing 
fragments (targeted to the aspartate-contain- 
ing SI pocket) and the optimum lengths of 
other hydrophobic reactants (targeted to the 
S4 pocket) to produce compounds that would 
sample the maximum number of binding 
modes. In this case the groups were always 
forced to be in the SI and S4 pockets to main- 
tain "reasonable" binding modes, although 
this restriction could be excluded to probe 
even further potential binding modes. Thus, 
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as the identity of the matched site pharma- 
cophore(s) was known for each compound, tar- 
get site-based diversity of binding modes could 
be explored in the design process. An opti- 
mized selection of reactants was possible, and 
the value to the design of reactants with dif- 
ferent chain lengths could be evaluated. 



6 CONCLUSIONS AND FUTURE 
DIRECTIONS 





Similarity and diversity metrics have been 
successfully used for a variety of tasks, includ- 
ing virtual screening, subset selection, and 
combinatorial library design. Databases of vir- 
tual compounds (e.g., from validated combina- 
torial chemistry protocols and reactants) can 
be used for both virtual screening and library 
design (virtual screening on virtual libraries 
with additional combinatorial constraints). 
The ability to exploit rapidly large virtual li- 
braries cf compounds that could be made by 
validated combinatorial chemistry protocols 
provides very powerful virtual screening and 
library design approaches. Future directions 
for library design will involve the application 
of such approaches in a fully integrated fash- 
ion (e.g., the ADEPT tool described in Section 
4.10) and further enhancements to the con- 
straints necessary to achieve druglike com- 
pounds (e.g., 80% compliance to the Rule of 5, 
predictive models for metabolism- and toxici- 
ty-related issues). Where the goal is lead gen- 
eration (e.g., to enrich the compound screen- 
ing file for high throughput screening), a focus 
will be on target classes (gene families) of in- 
terest, and the generation of compounds with 
ieadlike properties, such as a lower molecular 
weight. The move away from combinatorial 
libraries to sparse arrays and noncombinato- 
rial (cherry-picked) libraries (90) will con- 
uiiue, enabling more effective designs with 
control of associated properties. However, as 
‘more property constraints are applied to the 
library designs for leadlike/druglike proper- 
ties, the need to include positive design ele- 
ments to ensure good biological activity is em- 
lized. The goal for drug discovery is thus 
identify targets and to generate compounds 
tare at the intersection of chemical, biolog- 
and druglike property (e.g., absorption. 







toxicophores) space. Different targets and dif- 
ferent expected routes of administration will 
require different constraints, and an element 
of diversity (with constraints toward a drug- 
occupied chemical space) will remain impor- 
tant, to enable the most effective use of com- 
binatorial library chemistry and to discover 
new leads for both established and new tar- 
gets. 
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1 INTRODUCTION 

Virtual screening, sometimes also called in 
silico screening, is a new branch of medicinal 
chemistry that represents a fast and cost- 
effective tool for computationally screening 
compound databases in search for novel drug 
leads. The roots for virtual screening go back 
to structure-based drug design and molecular 
modeling. In the 1970s researchers hoped to 
find novel drugs designed rationally using a 
fast growing number of diverse protein struc- 
tures being solved by X-ray crystallography (1 , 
2) or nuclear magnetic resonance (NMR) spec- 
troscopy (3). However, only very few drugs 
have resulted from those early efforts. Exam- 
ples include captopril as angiotensin-convert- 
ing enzyme inhibitor (4) and methotrexate as 
dihydrofolate reductase inhibitor (5). The rea- 
sons for this somewhat disappointing drug 
yield lie in the low resolution of the protein 
structures as well as limitations in computer 
yower and methods. Researchers have often 
tried de novo to design the final drug candidate 
on the computer screen. The compounds sug- 
gested have often been difficult to synthesize; 
initial failure in exhibiting potency has often 
resulted in the termination of structure-based 
projects. At the end of the 1980s rational drug 
design techniques became somewhat discred- 
ited because of the high failure rate in drug 
discovery projects. 

In the 1990s drastic changes occurred in 
the way drugs are discovered in the pharma- 
ceutical industry. High throughput synthesis 
(6, 7) and screening techniques (8) changed 
the lead identification process that is now gov- 
erned not only by large numbers of com- 
pounds processed but also by fast prosecution 
of many putative drug targets in parallel. The 
characterization of the human genome has re- 
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suited in a large number of novel putative 
drug targets. Improved screening techniques 
also make it possible to look at entire gene 
families, at orphan targets, or at otherwise un- 
characterized putative drug targets. In this 
environment of data explosion, rational design 
techniques have experienced a comeback (9). 
Although the exponentially growing number 
of solved protein structures at high resolution 
makes it possible to embark on structure- 
based design for many drug targets, virtual 
screening — the computational counterpart to 
high throughput screening — has become a 
particularly successful computational tool for 
lead finding in drug discovery. Whereas pro- 
prietary screening libraries typically hold 
about 10® compounds, this is only a tiny frac- 
tion of the conceivable chemical space for 
which estimates range between 10®® and 10^®® 
compounds (10, 11). The question is, of course, 
which subset of this enormous space should be 
synthesized and screened? Virtual screening 
attempts to answer this question by evaluat- 
ing large virtual libraries of up to 10^^ com- 
pounds through the use of a cascade of various 
screening tools to reduce the chemical space. 
This chapter describes the different concepts 
and tools used today for virtual screening. 
They reach from the assessment of the overall 
"druglikeness" of a small organic molecule to 
its ability to specifically bind to a given drug 
target. The interested reader is also referred 
to a selection of recent books and reviews on 
the subject of virtual screening (10, 12-18). 



2 CONCEPTS OF VIRTUAL SCREENING 

The basic goal of virtual screening is the re- 
duction of the enormous virtual chemical 
space of small organic molecules, to synthesize 
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and/or screen against a specific target protein, 
to a manageable number of compounds that 
exhibit the highest chance to lead to a drug 
candidate (10, 19). The major sources of infor- 
mation to guide virtual screening for a partic- 
ular target are derived from the following 
questions: 

1. What does a drug look like in general? 

2. What is known about compounds that in- 
teract with the receptor? 

3. What is known about the structure of the 
target protein and the protein-ligand 
interactions? 

In the following subsections we address 
these three points, outlining concepts of as- 
sessing the overall druglikenss of molecules, 
the concentration of subsets of molecules in 
focused libraries, and the identification of spe- 
cific leads through structure-based virtual 
screening techniques. 



2.1 Druglikeness Screening 



Many drug candidates fail in clinical trials be- 
cause of reasons unrelated to potency against 
the intended drug target. Pharmacokinetics 
and toxicity issues are blamed for more than 
half of all failures in clinical trials. Therefore, 
the first part of virtual screening evaluates the 
druglikeness of small molecules, mostly inde- 
pendent cf their intended drug target (there 
are specific drug classes such as those acting in 
the central nervous system that require spe- 
cific drug profiles). Druglike molecules exhibit 
favorable absorption, distribution, metabo- 
lism, excretion, and toxicological (ADMET) 
parameters (20-24). They are synthetically 
feasible and possess pharmacophore features 
that offer the chance of specific interactions 
with the intended protein target. Druglike- 
ness is currently assessed using the following 
types cf methods: simple counting methods, 
fiinctional group filters, topological filters, and 
pharmacophore filters. Computational tech- 
niques used to identify druglikeness include 
neural networks (25-27), recursive partition- 
^ approaches (25, 28), and genetic algo- 
ihthms (29). These methods are further dis- 
cussed below. 



Table 6.1 Typical Ranges for Parameters 
Related Druglikeness" 



Parameter 


Minimum 


Maximum 




-2 


5 


LogP 

Molecular weight 


200 


500 


Hydrogen bond acceptors 


0 


10 


Hydrogen bond donors 


0 


5 


Molar refractivity 


40 


130 


Rotatable bonds 


0 


8 


Heavy atoms 


20 


70 


Polar surface area [A^] 


0 


120 


Net charge 


-2 


+2 



"Data taken from ref. 21 . 



2.1 .1 Counting Schemes. Database collec- 
tions of known drugs [e.g., CMC (30), WDI 
(31), or MDDR (32)] are typically used to ex- 
tract knowledge about structure and proper- 
ties of potential drug molecules. Key physico- 
chemical properties such as molecular weight, 
charge, and lipophilicity (33, 34) of drug col- 
lections are profiled to extract simple counting 
rules for relevant descriptors of ADMET-re- 
lated parameters. Examples include Lipinski’s 
"rule-of-five" (33), which limits the range for 
molecular weight (MW 500), computed oc- 
tanol-water partition coefficient (ClogP ^ 5), 
and hydrogen-bond donors and acceptors 
(OHs -i- NHs ^ 5; Ns -I- Os < 10). Other au-. 
thors limit the number of rotatable bonds (RB 
^ 8) or rings in a molecule (number of rings 
4) (34). Table 6.1 shows a list of typical 
boundaries of counting parameters. Figure 6. 1 
illustrates the profiling procedure for these 
counting parameters using polar surface area 
(PSA) (35) as a descriptor. Collections of 776 
orally administered CNS drugs and 1590 
orally administered non-CNS drugs that 
reached phase II efficacy studies were ana- 
lyzed for their PSA. It was found that 90% of 
the non-CNS compounds have a PSA below 
120 A^; 90% of CNS drugs have a PSA below 
80 A^, Although it is possible that drugs have 
higher PSA values and are still orally bioavail- 
able or penetrate the blood-brain barrier (as 
the result of active transport or other rea- 
sons), the profile suggests that it is much less 
likely. It is therefore a reasonable assumption 
in a virtual screening approach to discrimi- 
nate against compounds outside the most pop- 
ulated descriptor space (in this case, PSA 
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Figure 6.1. Distribution of po- 
lar surface area for 776 orally ad- 
ministered CNS drugs (black 
bars) and for 1590 orally admin- 
istered non-CNS drugs (white 
bars) that have reached clinical 
phase II efficacy studies (35). 




< 120 A^), especially if the compound lies out- 
side the optimal region for several descriptors 
(e.g., MW > 500 and Clog P > 5). 

Simple descriptors as described above are 
quickly calculated and counted. Therefore, af- 
ter typically removing compounds with atoms 
other than C, N, O, S, H, P, Si, Cl, Br, F, and I, 
counting schemes present the first filter in vir- 
tual screening approaches. 

2.1.2 Functional Croup Filters. Reactive, 
toxic, or otherwise unsuitable compounds, 
such as natural product derivatives, are re- 
moved using specific substructure filters. Fig- 
ure 6.2 shows a subset of substructures that 
lead to the dismissal of compounds in virtual 
screening. Typical reactive functional groups 
include, for example, reactive alkyl halides, 
peroxides, and carbazides. Unsuitable leads 
may include crown ethers, disulfides, and ali- 
phatic methylene chains seven or more long. 
Unsuitable natural products may include qui- 
nones, polyenes, or cycloheximide derivatives. 
A list of such fragments coded in Daylight 
SMARTS is given, for example, by Hann and 
CO workers (36). It should be noted, however, 
that natural product derivatives are not al- 
ways unsuitable leads. 

Screening out compounds that contain cer- 
tain atom groups associated with toxicity pro- 
vides a practical and fast way to reduce large 
databases; however, it is only a crude approx- 



imation for eliminating potentially toxic com- 
pounds. Better descriptions of toxicity may be 
provided by structure-based methods to assess 
toxicity of compounds. They draw primarily 
from mutagenicity, carcinogenicity, and acute 
toxicity databases assembled, for instance, by 
the National Toxicology Program (37) and the 
Toxic Effect of Chemical Substances database, 
RTECS (38). CASETox (39), TOPKAT (40), 
and DEREK (41) are commercial software 
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Figure 6.2. Selection of reactive functional groups 
that should be removed from a virtual screen (exam- 
ples taken from Ref. 212). 
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Drug (Output =1) 
Non-drug (Output = 0) 




Figure 6.3. Neural network 
architecture for prediction of 
druglikeness. 



products that can be used to evaluate virtual 
conpoueds for potential toxicity. 

2.1.3 Topological Drug Classification. It is 

generally assumed that compounds with 
structural similarity to known drugs may ex- 
hibit druglike properties themselves, such as 
oral bioavaiilability, low toxicity, membrane 
permeability, and metabolic stability. Follow- 
ing this iidea, drug databases and reagent da- 
tabases such as the ACD (42) as negative con- 
trol (assuming they do not contain many 
drugs) have been analyzed to find structural 
features of drugs and nondrugs. Neural net- 
wjk approaches have been devised (25, 27) 
that can discriminate between drugs and non- 
dmgs with about 80% certainty. Recursive 
partitioning approaches classify drugs and 
nondrugs with similar accuracy. 

2.1. 3.1 Artificial Neural Networks and De- 
cision Trees. Figure 6.3 shows an example of a 
simple neural network that uses Ghose and 
Crippen atom types (43) to code the molecular 



structure. Ninety-one statistically significant 
atom types correspond to 91 input neurons of 
the neural net. Typically, five neurons in the 
hidden layers are used in the net design (25, 
27). The single neuron output layer can vary 
between 0 (nondrugs)or 1 (drugs). Trained on 
5000 drugs taken from the WDI and 5000 com- 
pounds labeled nondrugs taken from the ACD, 
the resulting neural net was shown to cor- 
rectly classify about 80% of other drugs/non- 
drugs (27). 

Recursive partitioning, also known as the 
decision tree approach, is another powerful 
method to extract knowledge from a database. 
Wagener and Geerestein have explored the 
WDI and ACD databases to train a decision 
tree for the discrimination of drugs and non- 
drugs (28). Figure 6.4 shows a partial decision' 
tree derived by the authors. One rule derived 
from this partial tree is, for example, if a com- 
pound possesses no alcohol and a tertieuy ali- 
phatic amine but no methylene linker between 
a heteroatom and a carbon atom, it is not 



Alcohol 

I 

Tertiary amine 
j no 

Secondary amine 
I no 

Phenol; enol; carboxyl 
I no 



Drug 




Drug 



Drug 



no 



Figure 6.4. Partial decision tree Irom Wagener 
and Geerestein (28). C(n)sp^ describes a carbon 
with hybridization sp^ and formal oxidation 
number n. X refers to a heteroatom; R refers to 
any group linked through a carbon. The tree 
starts at the top left comer. Here is an example 
of how to read the tree: If a compound contains 
an alcohol, it is classified as a drug. If it does not 
contain an alcohol, the presence of a tertiary 
amine is checked. If it contains a tertiary amine 
and also contains (does not contain) a CH, group 
with attached heteroatom as well as another R 
group, it is classified as dmg (nondrug). 
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Figure 6.5. Dissection cf a drug mol- 
ecule into framework and side chains. 








F 



druglike. It is interesting to note that just by 
testing the presence or absence of hydroxyl, 
tertiary and secondary amines, carboxyl, 
phenol, or enol groups, 75% of all druglike 
structures in the MDDR and CMC can be 
recognized. 

Once they are trained, neural networks 
and decision trees are very fast filter tools in 
virtual screening approaches. They are there- 
fore applied early in the virtual screening filter 
cascade. 

2. 1.3.2 Structural Frameworks and Side 
Chains of Known Drugs. Databases have been 
mined to find structural motifs and pharma- 
cophore features of small molecules that 
characterize drugs. Bemis and Murcko (44) 



dissected drug molecules from the Compre- 
hensive Medicinal Chemistry (CMC) (30) da- 
tabase into side chains and frameworks (con- 
taining ring systems and linkers). They found 
that only 32 frameworks described the shapes 
of half the 5120 drugs in the CMC containing 
1170 scaffolds. Figures 6.5 and 6.6 show the 
process of reducing a drug molecule to its 
framework and a list of the most frequently 
occurring frameworks in the CMC. Side 
chains most frequently occurring in drug mol- 
ecules have also been analyzed (45). It has 
been found that of the 15,000 side chains can- 
tained in the CMC, about 1 1 ,000 belong to one 
of only 20 side chains, including (starting with 
the most frequent): carbonyl, methyl, hy- 



Figure 6.6. Most frequently occurring 
frameworks in drugs (numbers indicate 
percentages of occurrence in CMC data- 
base). Data are taken from Bemis and 
Murcko (44). 
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droxyl, methoxy, chloro, methylamine, pri- 
mary amine, carboxylic acid, fluoro, and sul- 
fone. Most molecules possess between one and 
five side-chains; more than 20% of the drugs 
stored in the CMC have two side chains per 
molecule. 

For the analysis of virtual libraries accord- 
ing to the presence or absence of druglike 
frameworks, side-chains or structural motifs 
can be used for virtual screening. This idea has 
beai extended in RECAP (retrosynthetic com- 
binatorial analysis procedure), a technique 
that identifies common motifs in drugs based 
on fragmenting molecules around bonds 
formed by common reaction (46) (Fig. 6.7). Ex- 
trading rules from RECAP for virtual screen- 
ingrepresents a possible way of addressingthe 
questions of ease of synthesis of compounds. A 
similar approach to assess the occurrence of 
structurgd motifs in drug molecules was pre- 
sented by Wang and Ramnarayan, who devel- 
oped the concept of multilevel chemical com- 
patibihty (MLCC) between a drug database 
and a test molecule as a measure for druglike- 
ness. In the MLCC method a compound is rec- 
ognized as druglike if all of its topological mo- 
fs occur in other known drugs. 



2.1.4 Pharmacophore Point Filter. The to- 
pological drug fragmentation approaches 
oiscussed above suggest that the occurrence 



of a relatively small number of frameworks 
(ring structures and linkers), an even 
smaller number of side chains, and a small 
number of polar groups characterize drugs 
very well. Although drugs and nondrugs are 
not completely distinguishable, it has been 
observed that drugs differ somewhat from 
nondrugs in their possession of hydrophobic 
moieties that are well functionalized. Non-, 
drugs often contain underfunctionalized hy- 
drophobic groups (Fig. 6.8). Recent work to 
characterize the druglikeness of molecules 
focuses more on the presence of key func- 
tional groups in molecules. 

A simple pharmacophore point filter has 
been introduced recently (47). It is based on 
the assumption that druglike molecules 
should contain at least two distinct pharma- 
cophore groups (47). Four functional motifs 
have been identified that guarantee hydrogen- 
bonding capabilities that are essential for the 
specific interaction of a drug molecule with its 
biological target (Fig. 6.9). These motifs can be 
combined to functional groups that are also 
referred to here as pharmacophore points; 
they include: amine, amide, alcohol, ketone, 
sulfone, sulfonamide, carboxylic acid, carbam- 
ate, guanidine, amidine, urea, and ester. The 
following main rules apply to the pharma- 
cophore point filter (PFl): 
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Figure 6.8. Number of pharma- 
cophore points in drug databases 
(MDDR -I- CMC) and reagent da- 
tabases (ACD). 




0123456789 >=10 



Nuntecf pharmacophore points 



• Pharmacophore points are fused and 
counted as one if they are separated by less 
than two carbon atoms. 

• Molecules with less than two and more than 
seven pharmacophore points fail the filter. 

• Amines are considered pharmacophore 
points but not azoles or diazines. 

• Compounds with more than one carboxylic 
acid are dismissed. 

• Compounds without a ring structure are 
dismissed. 

• Intracyclic amines in the same ring are 
fused to one pharmacophore point. 

The requirement of two distinct pharmaco- 
phore points neglects at least one very impor- 
tant class of drugs: biogenic amine-containing 
CNS drugs. Therefore, a second pharmaco- 
phore filter has been designed that requires 
only one pharmacophore point in small mole- 
cules of the type amine, amidine, guanidine, or 
carboxylic acid (PF2). 

An analysis of drug databases and reagent- 




Figure 6.9. Functional motifs of drugs used to 
build pharmacophore points. 



type databases reveal that about two thirds of 
drugs and nondrugs can be classified correctly 
by PFl. This performance is not as impressive 
as that of neural networks . However, as a filter 
for virtual screening, pharmacophore point fil- 
ters offer some advantages. First, the occur- 
rence and count of pharmacophore points can 
be evaluated on the building-block level of a 
virtual combinatorial library. No enumeration 
is necessary as for druglike neural nets. Sec- 
ond, the results of the pharmacophore point 
filter can be easily interpreted. Third, the set- 
tings of the filter can be easily adjusted (e.g., 
PFl for non-CNS drugs, PF2 for CNS drugs). 

2.2 Focused Screening Libraries for Lead 
Identification 

Without the knowledge about specific drug 
targets it is sometimes useful to apply virtual 
screening for the design of focused libraries of 
a few thousand compounds rather than to find 
a small number of hits to be tested against a 
specific target. To save resources it may some- 
times be more prudent not to run the entire 
HTS file against a target protein; instead, a 
focused library with higher chances of con- 
taining hits may be scrutinized. Those focused 
libraries may be designed to target specific 
protein families such as GPCRs, kinases, or 
nuclear hormone receptors. They can also be 
enriched with privileged structures that occur 
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ime often in drug molecules and/or were 
found to inhibit members of the protein fam- 
ily. 



2.2.1 Targeting Protein Families. Target 
class-directed libraries can be built from avail- 
able compounds or be synthesized in combina- 
torial fashion. The design of target class-di- 
rected libraries relies on the identification of 
structural motifs in small molecules or in 
building blocks for combinatorial libraries 
that can be linked to increased activity for the 
target class. Functional groups that show the 
propensity to hit a certain target class can be 
found by examining ligands from the litera- 
ture. Recurring motifs for GPCRs include, for 
example, piperazines, morpholines, and pip- 
eridines; for kinases they include, for example, 
heterobicycUc compounds or pyrimidines. 
Compounds bearing those structural motifs 
are thought to have a generally higher chance 
to be active against the respective target 
classes. A more rigorous approach to identify 
^ the "GPCR-likeness" of compounds or build- 
ingblocks can be provided by a statistical anal- 
ysis of drugiike databases. Neural networks 
have been shown to be particularly useful in 
classifying chemical matter, such as CNS-ac- 
/ tive compounds (26, 48). 

A neural network approach similar to that 
of Sadowski and Kubinyi (27) has been de- 
scribed recently to address the “GPCR-like- 
ness” cf small molecules as well as building 
blocks for combinatorial libraries (49). A feed- 
forward neural net was trained using 5000 
eompounds from the MDDR that target 
CrPCRs and 5000 compounds that target other 
'protem classes. Using the "activity-class" field 
of the database, about 20,000 GPCR-like and 
55,000 non-GPCR-like have been identified by 
ientries such as 5HT, leukotriene, and PAF. 
IThe resulting neural net classifies GPCR-like 
l^mpounds correctly with 80% certainty. An 
independent test of compounds in our propri- 
|etary database that were found to hit GPCRs 
;sor other targets showed a correct prediction of 
;fiPCR-like compounds in 70% of the cases. 
IWhoi several virtual combinatorial libraries 
^ere analyzed, it turned out that the property 
^f being GPCR-like could be attributed to the 
’fiPCR-likeness of the building blocks alone; 
t is, the GPCR-likeness of the enumerated 



compound correlated very well with the 
GPCR-likeness of the most GPCR-like build- 
ing block it contained. This offers an impor- 
tant advantage for the design of combinatorial 
libraries because, for large virtual libraries, 
the computer costs for enumeration go with 
the power of the number of R groups and thus 
very quickly becomes impractical. For in- 
stance, for a 3-R-group library with 1000 
building blocks each, the enumerated library 
would contain 1 billion compounds to be ana- 
lyzed, whereas the building block-level analy- 
sis needs to examine only 3000 compounds. 
Figure 6.10 shows a list of amine building 
blocks extracted from the ACD that were 
found to be most GPCR-like by the neural net. 

Not every portion of a GPCR-like molecule 
has to be GPCR-like. The presence of one 
GPCR-like moiety (building block or core 
structure) is sufficient to make a compound 
GPCR-like. Therefore, the neural network of- 
fers two different strategies for the design of 
GPCR-like libraries: (1) GPCR-like core -f 
druglike building blocks (need not be GPCR- 
like); (2) non-GPCR-like core + GPCR-like 
building blocks. Virtual screening of a data- 
base of existing compounds using the de- 
scribed neural net can be applied to assemble a 
focused screening library. Alternatively, com- 
binatorial libraries can be designed. 

2.2.2 Privileged Structures. Privileged struc- 
tures are structural types of small molecules 
that are able to bind with high affinity to multi- 
ple classes of receptors (50). An enrichment of 
libraries with privileged structures may in- 
crease the chance of finding active compounds. 
Examples of privileged structures include ben- 
zazepine analogs found to be effective ligands 
for an enzyme that cleaves the peptide angioten- 
sin I, whereas others are effective CCK-A recep- 
tor ligands. Cyproheptadine derivatives were 
found to have peripheral anticholinergic, antise- 
rotonin, antihistaminic, and orexigenic activity. 
Hydroxamate and benzamidine derivatives 
have been shown to be privileged structures for 
metalloproteases and serine proteases, respec- 
tively. For the class of 7-transmembrane G-pro- 
tein-coupled receptors a large number of privi- 
leged structures has been found including, for 
example, diphenylmethane, diazepine, benzaz- 
epine, biphenyltetrazole, spiropiperidine, in- 
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Figure 6.10. Selection of GPCR-like amines from the ACD. 



dole, and benzylpiperidine (51). Some ubiqui- 
tously privileged structures have recently been 
identified (52). They include carbo3^1ic acids, bi- 
phenyls, diphenylmethane, and, to a lesser ex- 
tent, naphthyl, phenyl, cyclohexyl, dibenzyl, 
benzimidazole, and quinoline. 

2.3 Pharmacophore Screening 

In cases where no structural information 
about the target protein is given, pharmaco- 
phore models can provide powerful filter tools 
for virtual screening (53). Even in cases where 
the protein structure is available, pharma- 
cophore filters should be applied early because 
they are generally much faster than docking 
approaches (discussed below) and can, there- 
fore, greatly reduce the number of compounds 
subjected to the more expensive docking appli- 
cations. For example, a pharmacophore model 
consisting of three pharmacophore points can 
be tested against about 10® compounds in a 
few minutes of computer time [disregarding 
the time it takes to generate three-dimen- 



sional (3D) conformations of each molecule] 
(10). Another interesting aspect of pharma- 
cophores in virtual screening is 3D-pharma- 
cophore diversity. Although the diversity con- 
cept for virtual compounds in general is not 
applicable because of the enormity of the 
chemical space, diversity in pharmacophore 
space is a feasible concept. Virtual libraries 
can therefore be optimized for covering a wide 
pharmacophore space. 

2.3.1 Introduction to Pharmacophores. In 

1894 Emil Fischer proposed the “lock-and- 
key” hypothesis to characterize the binding of 
compounds to proteins (54). This can be con- 
sidered the first attempt to explain binding of 
small molecules to a biological target. Proteins 
recognize substrates through specific interac- 
tions. It is a challenge for the medicinal chem- 
ist to synthesize compounds that can capture 
the 3D arrangement of functional groups in a 
small molecule that forms the pharmacophore 
and that is responsible for substrate binding 
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Figure 6.11. Pharmacophore de- 
rived based on the interactions be- 
tween human cy din- dependent ki- 
nase 2 and the adenine-derived 
inhibitor H717 as observed in the 
X-ray structure of the complex 
(PDB entry 1G5S). Dashed lines 
highlight hydrogen-bonding inter- 
actions. HBD, hydrogen-bond do- 
nor; HBA, hydrogen-bond accep- 
tor. The hinge region is linking the, 
N- and C-terminal domains of a ki- 
nase. 



to the protein. The first definition of the phar- 
macophore formulated by Paul Ehrlich was "a 
molecularframework that carries (phoros) the 
essential features responsible for a drug's 
(pharmacon) biological activity" (55). This 
definition was slightly modified by Peter Gund 
to "a set cf structural features in a molecule 
that is recognized at a receptor site and is re- 
sponsible for that molecule's biological activ- 
ity" (56). An example is shown in Fig. 6.1 1. An 
X-ray structure of CDK2 complexed with the 
adenine-derived inhibitor H717 (57-59) has 
been solved. Interactions that are essential to 
substrate and inhibitor binding to the enzyme 
will form the pharmacophore that should be 
captured by inhibitors binding the same way 
H717 does. As shown in Fig. 6.11, the inhibitor 
binds to the hinge region (Phe82 and Leu83) 
through two hydrogen bonds, to a hydropho- 



bic region through the cyclopentyl group, and 
to Asp 145 and Asnl32 through hydrogen 
bonds. The pharmacophore that reflects these 
interactions has a hydrogen-bond donor and a 
hydrogen-bond acceptor pair that ensures 
binding to the hinge region, a hydrophobic 
group that corresponds to the cyclopentyl 
binding site, and a hydrogen-bond donor that 
ensures binding to Asp 145 and/or Asnl32. 
Note that in addition to distances that de- 
scribe the 3D relationship among pharma- 
cophore points, angles, dihedrals, and exclu- 
sion volumes are also used. Each additional 
restraint can reduce the number of hits, thus 
making the compound selection easier for 
testing. Pharmacophore hypotheses for 
searching can be generated using structural 
information from active inhibitors, ligands, or 
from the protein active site itself (60, 61). 
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Figure 6.12. Examples of SMILES notations 
for two compounds obtained using CACTVS and 
Daylight. 




CACTVS 




COC(= 0)C1 C2CCC(CC1 c3ccccc3)N2C 



2.3.2 Databases of Organic Compounds. 

Virtual screening is used in general for select- 
ing potentially active compounds from data- 
bases of compounds available either in-house 
or from a vendor. Because virtual screening is 
not accurate enough to identify only active 
compounds as hits, it is less risky to screen 
databases with existing compounds rather 
than synthesize a new library. Nevertheless, 
virtual libraries that can be synthesized 
through combinatorial chemistry and/or rapid 
analoging can easily be generated using in 
silico methods. These libraries are more often 
generated for lead optimization and synthesis 
prioritization (62, 63). 

There is a wealth of databases that code 
available compounds typically in the two-di- 
mensional standard data (2D-SD) format in- 
cluding connectivity from MACCS (32, 64). 
The most common databases are the Available 
Chemicals Directory (ACD) (42), Spresi (65), 
Chemical Abstracts Database (66), and the 
National Cancer Institute Database (67, 68). 
Many vendors of chemicals also provide 
searchable databases with 2D-structure and 



property information of their compounds. 
Sometimes compounds are coded in linear rep- 
resentations such as the SMILES (69, 70) po- 
tation. The SMILES codes obtained using 
CACTVS and Daylight programs for 4-benzyl 
pyridine and R-cocaine are shown in Eig. 6.12. 

The primary source of 3D experimental 
structures of organic molecules is the Cam- 
bridge Structural Database (71). Alterna- 
tively, 2D databases of organic compounds can 
be converted into 3D databases using several 
software programs (72). Each program starts 
with generating a crude structure that is sub- 
sequently optimized using a force field. CON- 
CORD (73) applies rules derived from experi- 
mental structures and a univariate strain 
function for building an initial structure. GO- 
RINA (74) generates an initial structure by 
use of a standard set of bond lengths, angles 
and dihedrals, and rules for cychc systems. 
RUBICON (75) invokes distance geometry 
techniques to generate 3D structures based on 
connectivity tables. This program also uses 
bond lengths and angle tables to build a ma- 
trix containing the upper and lower bounds 
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for distances between all atoms in the mole- 
cule. OMEGA (76) uses a torsion-driven ap- 
proach for building conformers. It generates 
tw energy conformers for each molecule by 
assembling it from fragments and searching 
through possible orientations of the subunit 
added. WIZARD (77) and COBRA (78), AIMB 
(79) and MIMUMBA (80) employ artificial in- 
telligence techniques for generating a set of 
user-specified low energy conformations for a 
compound MOLGEO (81)uses a depth-first 
qjproach for generating 3D structures based 
on connectivity using bond length and bond 
angle tables. IDEALIZE (82) is a molecular 
mechanics program that minimizes 2D struc- 
tures to generate the corresponding 3D struc- 
ture. 



2.3.3 2D Pharmacophore Searching. Search- 
ing 2D databases is of great importance for ac- 
celerating drug discovery. Chemical suppliers 
provide databases of purchasable compounds 
that medicinal chemists search for starting ma- 
terial for synthesis or analogs of a lead com- 
pound. Different strategies are pursued to 
search a 2D database to identify compounds of 
interest. Exact structure search is applied to find 
out whether a compound is present in the data- 
base. Substructure searches identify larger mol- 
ecules that contain the user-defined query, irre- 
spective cf the environment in which the query 
substructure occurs (83) (Eig. 6.13). Eurther- 
more, substructure searching can identify all 
r compounds in a database that share the same 
core structure. Biochemical data obtained from 
testing these compounds can be used for gener- 
ating structure-activity relationships (SARs), 
even before synthetic plans are made for lead 
i^timization (84). In contrast, superstructure 
searches are used to find smaller molecules that 
’tre embedded in the query (Eig. 6. 14). One prob- 
fem that arises from substructure searches is 
that the number of compounds identified can 
reach into the thousands. A solution to this 
ip^blem is ranking the compounds based on 
dty to a reference compound. Similarity 
ches use one or more structural descriptors 
br quantifying the similarity between com- 
5unds in the database and in the query (85, 86) 
•ig. 6.15). A review of descriptors used in simi- 
nty searches is provided by Willett et al. (86). 
yond structural similarity, activity similarity 









has also been the subject of several studies. Xue 
et al. showed that compounds with similar activ- 
ity could be identified using mini-fingerprints 
(87-89), physicochemical property descriptors 
(90), or latent semantic structure indexing (91, 
92). In addition, similarity searches can be com- 
bined with superstructure searches for limiting 
the number of compounds selected. Rexible 
match searches are used for identifying com- 
pounds that differ from the query structure in 
user-specified ways. In addition, isomer, tau- 
tomer, and parent molecule searches may be 
done to find in a database isomers, tautomers, or 
parent molecules of the query. 



2.3.4 3D Pharmacophores 
2. 3.4.1 Ligand-Based Pharmacophore Gen- 
eration. Ligand-based pharmacophores are 
typically used when the crystallographic, solu- 
tion structure, or modeled structure of a pro- 
tein cannot be obtained. When a set of active 
compounds is known and it is hypothesized 
that all compounds bind in a similar way to the 
protein, then common groups should interact 
with the same protein residues. Thus, a phar- 
macophore capturing these common features 
should be able to identify from a database 
novel compounds that bind to the same site of 
the protein as the known compounds do. The 
process of deriving a pharmacophore, called 
pharmacophore mapping, consists of three 
steps: (1) identifying common binding ele- 
ments that are responsible for biological activ- 
ity; (2) generating potential conformations 
that active compounds may adopt; and (3) de- 
termining the 3D relationship between phar- 
macophore elements in each conformation 
generated. To build a pharmacophore based 
on a set of active compounds, two methods are 
usually applied. One method is to generate a 
set of minimum energy conformations for 
each ligand and search for common structural 
features. Another method is to consider all 
possible conformations of each ligand to eval- 
uate shared orientations of common func- 
tional groups. Analyzing many low energy 
conformers of active compounds can suggest a 
range of the distance between key groups that 
will take in account the flexibility of the li- 
gands and of the protein. This task can be per- 
formed either manually or automatically. 
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Figure 6.13. Compounds identified from the ACD database through substructure search. 



23.4,2 Manual Pharmacophore Genera- 
tion. Manual pharmacophore generation is 
used when there is an easy way to identify the 
common features in a set of active compounds 
and/or there is experimental evidence that 
some functional groups should be present in 
the ligand for good activity. An example is the 
development of a pharmacophore model for 
dopamine-transporter (DAT) inhibitors (Fig. 
6.16). In the first step common structural fea- 
tures were identified in the selected five DAT 
inhibitors (93-95) (Fig. 6.16, circles). Four out 
of five compounds were structurally rigid, 
whereas the 4-hydroxy piperidinol was flexi- 
ble. A systematic conformational search for 
4-hydroxy piperidinol identified 10 possible 
conformations. Measuring distances among 
pharmacophore elements in every inhibitor 



and every conformation considered led to the 
distance ranges among pharmacophore points 
shown in Fig. 6.16. Because proteins are flex- 
ible, pharmacophores should also have some 
flexibility built in, thus justifying the use of 
distance ranges. 

2.3.43 Automatic Pharmacophore Genera- 
tion. Pharmacophore generation through 
conformational analysis and manual align- 
ment is a very time-consuming task, especially 
when the list of active ligands is large and the 
elements of the pharmacophore model are not 
obvious. There are several programs, HipHop 
(96), HypoGen (97), Disco (98), Gasp (99), Flo 
(100), APEX (101), and ROCS (102), that can 
automatically generate potential pharma- 
cophores from a list of known inhibitors. The 
performance of these programs in automated 







2 Concepts of Virtual Screening 



257 















258 



Virtual Screening 




I 




Figure 6.16. Manual pharmacophore mapping by measuring distances between pharmacophore 
points in every compound and conformation considered. Pharmacophore elements are highlighted 
with circles. All structures were built and minimized using QUANTA. Conformers of 4-hydroxy 
piperidinol were generated using the Grid Scan method from QUANTA, followed by clustering, to 
identify unique conformers. 
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pharmacophore generation varies depending 
on the training set. The use of these programs 
for pharmacophore generations was recently 
reviewed in detail (103). Here we focus on 
ccinrroi features of these programs. AU pro- 
grams use algorithms that identify common 
pharmacophore features in the training set 
molecules; they use scoring functions to rank 
the identified pharmacophores. The following 
features are identified in each molecule: hy- 
drogen-bond donors, hydrogen-bond accep- 
tors, negative and positive charge centers, and 
surface accessible hydrophobic regions that 
can be aliphatic, aromatic, or nonspecific. 
MdS cf the programs consider ligand flexibil- 
ity when generating pharmacophores because 
compounds might not bind to the protein in 
the minimum energy conformation. 

2.3.4A Receptor-Based pharmacophore Gen- 
eration. If the 3D structure of a receptor is 
known, a pharmacophore model can be de- 
rived based on the receptor active site. Bio- 
chemical data can be used for identifying key 
residues that are important for substrate 
and/or inhibitor binding. This information can 
be used for building pharmacophores target- 
ing the region defined by key residues or for 
choosing among pharmacophores generated 
by an automated program. This can greatly 
improve the chance of finding small molecules 
that inhibit the protein because the search is 
focused on a region of the binding site that is 
crucial for binding substrates and inhibitors. 
Many hgands bind to proteins through non- 
bonded interactions such as hydrogen bonds 
and hydrophobic interactions. Programs such 
asLUDI (104-106)or POCKET (107)can use 
the structure of the protein to generate inter- 
action sites or grids to characterize favorable 
positions that ligand atoms should occupy. 
Four types of interaction sites are character- 
sized hydrogen-bond donors, hydrogen-bond 
’acceptors, and hydrophobic groups that can be 
. lipophihc-ahphatic or lipophilic-aromatic. 
iLUDTgenerated interaction maps for Cerius^ 
.Strudure-Based Focusing (108) do not differ- 
entiate between aliphatic and aromatic inter- 
«ition sites. This is based on the observation 
pby Burley and Petsko (109) that, besides aro- 
.matic side chains, aliphatic and aromatic side 
i ehains also pack closely to form the hydropho- 
(hiccoie cf proteins. Because proteins are not 



rigid, Carlson et al. (110) proposed using mo- 
lecular dynamics simulation for generating a 
set of diverse protein conformations to include 
protein flexibility in the pharmacophore de- 
velopment. In this case distance ranges be- 
tween pharmacophores are obtained by exam- 
ining several conformations of the protein. 
This technique is similar to the one used for 
the generation of flexible pharmacophores 
(Fig. 6.16), based on active compounds, when 
several conformations of the compound and/or 
many compounds are considered for pharma- 
cophore mapping. 

2.3.5 Pharmacophore-Based Virtual Screen- 
ing. Pharmacophore-based virtual screening 
is the process of matching atoms and/or func- 
tional groups and the geometric relations be- 
tween them to the pharmacophore in the 
query. Examples of programs that perform 
pharmacophore-based searches are SDsearch 
(111), Aladdin (53), UNITY (112), MACCS-3D 
(113), Catalyst (114), and ROCS (102). There 
are also web-based applications (115, 116) that 
can perform pharmacophore searches. Usu- 
ally pharmacophore-based searches are done 
in two steps. First, the software checks 
whether the compound has the atom types 
and/or functional groups required by the phar- 
macophore; then it checks whether the spatial 
arrangement of these elements matches the 
query. The fastest approach used in the 
matching step is considering rigid compounds. 
Because molecules that are not rigid might 
have a conformation that matches the phar- 
macophore, flexibility of the ligands should be 
considered. Flexible 3D searches identify a 
higher number of hits than rigid searches do 
(117). However, flexible searches are more 
time consuming than rigid ones. There are 
two main approaches for including conforma- 
tional flexibility into the search: one is to gen- 
erate a user-defined number of representative 
conformations for each molecule when the da- 
tabase is created; the other is to generate con- 
formations during the search. By use of the 
first approach, any rigid search program can 
be used for doing a flexible search; however, 
generating the database takes more time and 
disk space. The second approach gives more 
flexibility to the user, given that a larger num- 
ber of conformations can be generated for each 
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molecule during the search. In this case the 
database search requires more computer re- 
sources; however, this approach will not miss 
conformations that fit the query but were not 
stored in the database. Pharmacophore que- 
ries that define distance ranges between phar- 
macophore elements compensate for possible 
conformational changes in the receptor site 
upon ligand binding. Also, these flexible phar- 
macophore queries compensate for the differ- 
ence between using multiconformer databases 
and generating conformers during the search. 

ROCS is using a shape-based superposition 
for identifying compounds that have similar 
shape. Grant and Pickup (118) showed that 
using atomic-centered Gaussians instead of a 
spherical function can dramatically reduce the 
time required for a shape alignment of two 
molecules. This improved routine allows the 
program to perform shape-based database 
searches at an acceptable speed (300-400 
conformers/s). 

There are several methods for generating 
conformers during i n silico screening. Torsion 
optimization (1 19) is used for minimizing the 
root-mean- square (rms) deviation between 
the constraints from the pharmacophore and 
the corresponding distances in the compound. 
The "directed tweak" (120) algorithm also 
uses torsion optimization for minimizing the 
sum of the squared deviations between dis- 
tances in the pharmacophore and the corre- 
sponding ones in the compound. Chem- 
DBS-3D (121) generates low energy 
conformations that can match the pharma- 
cophore using rules similar to those in WIZ- 
ARD (77). The distance geometry algorithm 
(122) uses bond length and bond angle infor- 
mation for building a matrix containing upper 
and lower limits of distances between atoms in 
the organic compound. These distances can be 
used for building the conformation that fits 
the pharmacophore query. The systematic 
search method (123) is feasible for molecules 
with few rotatable bonds and thus has limited 
applicability. 

2.4 Structure-Based Virtual Screening 

In direct analogy to high throughput screen- 
ing, docking and scoring techniques can be ap- 
plied to computationally screen a database of 
hundreds of thousands of compounds against 



a specific target protein.' Computational meth- 
ods that predict the 3D structure of a protein- 
ligand complex are often referred to as molec- 
ular docking approaches (Fig. 6.17) (124). 
Protein structures can be employed to dock 
ligands into the binding site of the protein and 
to study their interactions (125). For virtual 
screening, the crucial task at hand is the farst 
and reliable ranking of a database of putative 
protein-ligand complexes according to their 
binding affinities. Depending on ligand and 
protein flexibility, sampling depth, and opti- 
mizing schemes, docking programs used todaiy 
(Table 6.2) can facilitate this task within a few 
minutes or sometimes seconds per processor 
and molecule. Virtual screening as a computa- 
tion task can be trivially run using parallel 
computing because the protein-ligand docking 
events are completely independent of each 
other. Although docking has initially been de- 
veloped as a specialist modeling tool run on 
computer workstations, nowadays inexpen- 
sive Linux clusters or distributed computing 
over networked PCs can be used for virtual 
screening. This increases the in silico 
throughput into the realm of 100,000 com- 
pounds per day on a Linux cluster, therel:)y 
reaching the speed of today's high throughput 
screens. Energy functions that evaluate the 




Figure 6. 17. Crystal structure (PDB entry la4q) of 
the neuraminidase inhibitor zanamivir bound in tltre 
active site (213). 
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Table 6.2 Selection of Available Protein-Ligand Docking Software for Structure-Based 
Virtual Screening 



Docking Program 


Docking/Sampling 

Method 


Scoring Method 


GLIDE (www.schrodinger.com) 


Rigid protein; multiple 
conformation rigid 
docking; grid-based 
energy evaluation 


Empirical scoring, including 
penalty term for 
unformed hydrogen 
bonds; force-field scoring 


DOCK (www.cmpharm. 


Rigid protein; flexible 


Force-field scoring; chemical 


ucsf . edu/kuntz/dock. html) 


ligand docking 

(incremental 

construction) 


scoring, contact scoring 


FlexX (cartan.gmd.de/FlexX) 


Rigid protein; flexible 
ligand docking 
(incremental 
construction) 


Empirical scoring 
intertwined with 
sampling 


DockVision 

(www.dockvision.com) 


Monte Carlo, genetic 
algorithm 


Various force fields 


DockIT (www.daylight.com/ 
meetings/emugOO/ Dixon) 


Ligand conformations 
generated inside 
binding- site spheres 
using distance 
geometry 


PLP, PMF 


FRED (www.eyesopen. 


Exhaustive sampling; 


Chemscore, PLP, 


com/fred.html) 


rigid protein, 
multiple 

conformation rigid 

docking 


ScreenScore, and 
Gaussian shape scoring 


LigandFit (www.accelrys.com) 


Monte Carlo 


LIGSCORE, PLP, PMF, 
LUDI 


Gold (www.ccdc.cam.ac. 
uk/prods/gold/) 


Genetic Algorithm 


Soft core vdW potential and 
hydrogen bond potentials 



binding free energy between protein and li- 
gand sometimes employ rather heuristic 
terms. Therefore, those functions are more 
broadly referred to as scoring functions. 

2.4.1 Protein Structures. A SD-protein struc- 
ture of the receptor at atomic resolution is nec- 
essary to start a protein-ligand docking exper- 
iment. The exponential growth of solved 
crystal and solution structures in recent years 
provides a reliable source of protein struc- 
tures. The protein database (PDB) currently 
holds more than 18,000 protein structures. It 
should be noted, however, that the chances of 
a successful virtual screen very much depend 
on the quality of the available structure. The 
crystal structure should be well refined; typi- 
cally a resolution of at least 2.5 A is considered 
to be necessary (126). Small changes in struc- 
ture can drastically alter the outcome of a 



computational docking experiment (127). 
Moreover, many receptor sites are flexible; 
they often undergo conformational changes 
upon ligand binding. A good example is the 
Tyr248 movement of carboxypeptidases upon 
substrate or ligand binding, which has pro- 
vided the first structural perspective of Kosh- 
land's induced-fit hypothesis (128, 129). Pro- 
teins have to be studied carefully in every 
individual case to decide how promising a vir- 
tual screen may be. 

For many protein drug targets crystal or 
solution structures are not available. In such 
cases homology models (130,131) and pseudo- 
receptor models (132) are often used. How- 
ever, unless there is a very high conservation 
of receptor site residues the use of homology 
models for virtual screening is much riskier 
than using solved structures. On the other 
hand, the PDB contains a wealth of protein 
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Table 6.3 Occurrence of Selected Protein 
Classes Currently Identified in the Human 
Genome Database (GDB) and the Protein 
Database (PDB) 



Class 


GDB^ 


PDB” 


Nuclear receptors 


49 


42 


G-protein-coupled receptors 


408 


1 


Kinase 


945 


625 


Protease 


190 


330 


Peptidase 


108 


128 


Esterase 


106 


87 


Reductase 


210 


417 


Synthase 


191 


335 


Lyase 


38 


70 


Hydrolase 


131 


110 


Transferase 


500 


467 


Anhydrase 


27 


156 


Sulfatase 


26 


6 


Dehydrogenase 


347 


338 


Desaturase 


10 


1 


Phosphatase 


315 


184 


Phosphodiesterase 


63 


1 


Deacetylase 


18 


2 


Transporter 


238 


1 


Channel 


271 


24 



%ww.gdb.org 

^www.rcsb.org/pdb; note that the number of struetures 
available from the PDB often inelude several struetures of 
the same protein. 

structures of a wide variety of enzymes and 
receptors that can be used for homology mod- 
eling. Homology models can be built for a large 
number of protein classes coded in the human 
genome (Table 6.3). Because virtual screening 
is so inexpensive and the possible rewards, if 
successful, are so high it is generally war- 
ranted to run a virtual screening experiment, 
even if the chances of success are very small, 
as is often the case when homology models are 
employed. 

2.4.2 Computational Protein-Ligand Dock- 
ing Techniques. Docking ligands into a recep- 
tor site is a geometric search problem. The 
search has to take protein and ligand confor- 
mations as well as their relative orientations 
into account. The receptor conformation is 
typically reasonably well known. However, 
the bioactive conformation of the ligand is 
usually unknown. Nicklaus and coworkers 
showed that force-field energies of bioactive 
conformations of ligands, as represented in 



crystal structures of protein-ligand com- 
plexes, are typically about 25 kcal/mol^ higher 
than minimum conformations in vacuum 
(133). Therefore, the bioactive conformation 
of a ligand is hard to guess and a large number 
of possible ligand conformations have to be 
considered in docking. Most docking ap- 
proaches keep the receptor rigid and the li- 
gand flexible during the docking. Although 
protein flexibility is sometimes included (134- 
136), we will not discuss protein flexibility 
here, given that it is currently rarely used for 
virtual screening because of speed limitations. 
Some relevant concepts of docking approaches 
are shortly discussed below (for a broader review 
wereferthe reader to Ref. 125). Scoringprotein- 
ligand complexes will be discussed separately. 

2.4.2. 1 Rigid Docking. Although ligand and 
often also protein flexibihty are crucial for pro- 
tein-ligand docking, the simpler rigid hgand 
docking is sometimes useful. Ligand flexibihty 
can, for example, be simulated by rigidly docking 
an ensemble of preassigned ligand conforma- 
tions that represent the relevant conforma- 
tional space of the molecule. Algorithms such as 
chque search techniques (137) and geometric 
hashing (138) are often used to search for dis- 
tance-compatible matches of protein and hgand 
features (139). Possible features include comple- 
mentary hydrogen-bonding interactions, dis- 
tances, or volume segments of the receptor site 
of the protein or the hgand. 

The program DOCK uses an algorithm for 
rigid-body docking based on the idea of search- 
ing for distance-compatible matches. Starting 
with the molecular surface of the protein 
(140-142), a set of spheres is created inside 
the receptor site. The spheres represent the 
volume that could be occupied by a ligand mol- 
ecule (Fig. 6.18). Spheres can represent the 
ligand also; a direct atom representation is 
also possible. Early versions of DOCK relied 
solely on rigid ligand docking. Sets of up to 
four distance-compatible matches were evalu- 
ated. Each set was used for an initial fit of the 
ligand into the receptor site. Additional com- 
patibility matches were used to improve the 
fit. The position of the ligand was then opti- 
mized and scored. 

Since its first introduction in 1982, the 
DOCK software has been extended in several 
directions. The matching spheres can be la- 
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Figure 6.18. Receptor site of thyroid receptor beta 
filled with spheres (for sake of clarity, sphere cen- 
ters are depicted; actual size of spheres is larger, so 
that spheres overlap) and thyronine. Crystal struc- 
ture taken from the PDB (ID Ibsx). 

beled with chemical properties (143) and dis- 
tance bins are used to speed up the search pro- 
cess (144, 145).Recently, the search algorithm 
for distance-compatible matches was changed 
to the clique-detection algorithm introduced 
byKuhl (139, 146). Furthermore, several scor- 
ingfunctions are now applied in combination 
with the DOCK algorithm (147-151). 

2A.2.2 Flexible Ligands. Druglike mole- 
cules are typically flexible, with usually up to 
eight rotatable bonds (34). Energetic differ- 
ences between alternative ligand conforma- 
tions are often small compared to the total 
binding affinity between ligand and target 
protein. Also, for flexible ligands it is quite 
common that the bioactive conformations are 
different from the minimum energy confor- 
mations in solution (133). Ligand flexibility is 
typically handled in docking approaches by 
combinatorial optimization protocols such as 
firaigmentation, ensembles, genetic algo- 
rithms, or simulation techniques. 

In fragmentation approaches, the ligand is 
diesected into pieces that are either rigid or 
that can be represented by small conforma- 
tional ensembles. In docking approaches, typ- 
ically a strategy called incremental construc- 
tion is used to assemble fragments to whole 
molecules directly in the receptor site. Usu- 
ally, the largest rigid moiety of the ligand 
(sometimes called anchor) is docked first in 
the receptor site. The remaining fragments 
are subsequently added in a buildup protocol. 
After each incremental buildup step, torsion 
angles are sampled and the growing molecule 
is ininirnized. 



Ligand flexibility can be artificially in- 
cluded into docking by rigidly docking ensem- 
bles of pregenerated conformations of the li- 
gand into the receptor site. Rigid docking is 
faster than flexible docking by use of a frag- 
mentation approach. However, because com- 
puting time increases linearly with the num- 
ber of conformations, computing time and 
coverage of conformational snace have to be 
balanced. An example of rigid docking of con- 
formation ensembles is given in Flexibase/ 
LLOG (152). Distance geometry methods 
(153) are used to generate a small set of di- 
verse conformations for each ligand in the da- 
tabase. A subset of up to 25 conformations per 
molecule is selected using rms dissimilarity 
criteria and then docked using a rigid-body- 
docking algorithm. 

Different from the combinatorial ap- 
proaches for docking mentioned above, simu- 
lation methods start with a given configura- 
tion of a ligand in the receptor site. Simulation 
techniques such as simulated annealing (154) 
are then applied to find energetically more fa- 
vorable conformations of the ligand. To speed 
up the docking process, docking programs 
such as AutoDock (155) precalculate molecu- 
lar affinity potentials of the protein on a grid. 
Molecular dynamics (MD) methods (see, e.g., 
refs. 156 and 157) and Monte Carlo simulation 
techniques (see, e.g., Refs. 158-162) are also 
frequently used in protein-ligand docking 
applications. 

A variety of other sampling methods are 
applied in docking programs, including ge- 
netic algorithms, distance geometry methods, 
random searching, hybrid methods, and gen- 
eralized effective potential methods. Genetic 
algorithms have been employed in programs 
such as Gambler (163), AutoDock (155), and 
GOLD (126). PRO-LEADS uses an alternative 
search technique called "tabu search" (164). 
Starting from a random structure, new struc- 
tures are created by random moves. Atabu list 
is maintained during the optimization phase 
and contains the best and the most recently 
found binding configurations. Configurations 
that resemble those stored in the tabu list are 
rejected, except they are better than the one 
scoring best. The sampling performance is im- 
proved because previously sampled config- 
urations are avoided. Linally, it should be 
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mentioned that multistep hybrid docking pro- 
cedures have been developed that combine 
rapid fragment-based searching with sophisti- 
cated MC or MD simulations (165, 166). 

2.4.3 Scoring of Protein-Ligand Interactions. 

The problem of sampling the correct binding 
geometry (binding mode) of a protein-ligand 
complex is considered to be solved in many 
docking programs (167). However, to identify 
this correct binding mode by its lowest energy 
or score is a different matter; this is indeed the 
bottleneck of docking-scoring approaches to- 
day. The most important aspect of scoring 
functions for virtual screening is speed. 
Therefore, accuracy requirements are low; 
most functions used do not conceptually de- 
scribe binding free energies. Therefore, these 
functions are typically not called energy func- 
tions but scoring functions. Three main scor- 
ing strategies are typically used in docking ap- 
plications for virtual screening: force field 
scoring, empirical scoring, and knowledge- 
based scoring. 

2.4.3,1 Force Field (FF) Scoring. Nonbonded 
interaction energy terms of standard force fields 
are typically used in FF scoring (e.g., in vacuo 
electrostatic terms; sometimes modified by scal- 
ing constants that assume the protein to be an 
electrostatic continuum) and van der Waals 
(vdW) terms (168-171). DOCK and GREEN 
(172) use the intermolecular terms of the AM- 
BER energy function (173, 174), with the excep- 
tion of an exphcit hydrogen bonding term (147): 
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where each term is summed up over ligand 
atoms i and protein atoms j.A^ and are the 

vdW repulsion and attraction parameters of 
the 6-12 potential, is the distance between 
atoms i and j,q is a point charge at each of the 
atoms, and D is the dielectric constant. Intra- 
ligand interactions are added to the score. Up 
to a 100-fold gain in docking time can be 
achieved by precomputing these terms on a 3D 
grid that represents the protein during dock- 
ing (155, 175). More recently, solvation terms 



have been added to FF scores. Examples in- 
clude generalized Born/surface area ap- 
proaches (176) or atomic solvation parameters 
(177-179). 

2.4.3.2 Empirical Scoring. Empirical scor- 
ing functions are multivariate regression 
methods. They fit coefficients of physically 
motivated contributions to binding free en- 
ergy in reproduction of measured binding af- 
finities of a training set of protein-ligand com- 
plexes with known 3D structure. As an 
example, the docking program FlexX (180) 
uses a scoring function similar to that of Bohm 
(181,1 82). It calculates the sum of free-energy 
contributions from the number of rotatable 
bonds in the ligand, hydrogen bonds, ion-pair 
interactions, hydrophobic and pi-stacking in- 
teractions of aromatic groups, and lipophilic 
interactions: 



AG — AGq + AGrotATfot 

+ AGhi 2 /'(AiJ, A«) 

neutral_Hbonds 

+ AGi„ y /■(Afl, A«) (6.2) 

ionic_int 

+ AG„ 2 /'(AS, Aa) 

aro_int 

+ AG,i„ 2 /-*(AS) 

lipo.cont 

where AGq, AG^ot, AG;, AGi^, AG^ro, and 
AGiipo are adjustable parameters that are fit- 
ted; f{AR, Aa) is a scaling function penalizing 
deviations from the ideal geometry; and iV^ot is 
the number of freely rotatable bonds. The in- 
teraction of aromatic groups is an addition to 
Bohm's original force-field design (181, 182). 
The lipophilic contributions are calculated as 
a sum of atom-pair contacts in contrast to 
evaluating a surface grid as in Bohm's scoring 
function. Bohm's scoring function and its 
FlexX implementation are being improved 
and additional terms are being tested (see, 
e.g., Refs. 182 and 183). 

2.4.3.3 Knowledge-Based Scoring. Because 
the forces that govern protein-ligand interac- 
tions are so complex, an implicit approach to 
capture all relevant terms of protein-ligand 
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Figure 6.19. PMF score 
calculated for 132 protein-li- 
gand complexes taken from 
the PDB without overlap to 
the set of 697 complexes. 
The PMF score was derived 
from Ref. 186. 



banding seems very attractive. Borrowing 
fiom statistical thermodynamics of liquids, 
mean-field approaches derived solely from 
structural information have been applied to 
protein-ligand binding. Protein-ligand atom- 
pair potentials can be calculated from struc- 
tural data (e.g., PDB), assuming that observed 
crystallographic protein-ligand complexes ex- 
hibit optimal placement. As an example, a 
knowledge-basedscoring function was derived 
recently using 697 protein-ligand complexes 
fern the PDB as knowledge base. Using 16 
protein and 34 ligand atom types, a total of 282 
statistically significant interaction potentials 
of atom pairs was derived. The final score is 
calculated as the sum over all protein-ligand 
atom-pair interactions. 

PMF score = T Ajy(r); 

( 6 . 3 ) 
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jwheie kl is a ligand-protein atom pair of type 
designates the distance at which at- 
lom-pair interactions are truncated (6 A for 
|carbon-carbon interactions and 9 A other- 
ifrise); all Ay(r) are derived with a reference 



sphere radius of 12 A (184); is the Boltz- 
mann factor; T is the absolute temperature; 
and /voi cor/(^) is a ligand volume correction 
factor that is introduced because intraligand 
interactions are not accounted for (185, 186). 
pgeg"^(r) designates the number density of 
atom pairs of type ij at a certain atom-pair 
distance r. Pbuik"'^ is the number density of a 
ligand-protein atom pair of type ij in a refer- 
ence sphere with radius R (184). For use in 
docking studies, the PMF score is combined 
with a vdW term to account for short-range 
interactions (187, 188). The PMF scoring 
function was implemented into the DOCK4.0 
program. For faster scoring it was also imple- 
mented on a grid similar to the force-field 
score in DOCK. Flexible docking experiments 
on FK506 binding protein (187), neuramini- 
dase (127), and stromelysin (189) showed high 
predictive power and robustness of the PMF 
score. Figure 6.19 shows the predictive power 
of the scoring function applied to 132 protein- 
ligand complexes taken from the PDB. 

2A.3.4 Consensus Scoring. Consensus scor- 
ing is an approach that combines several scor- 
ing functions to find common hits. Such an 
approach seems desirable because of the miss- 
ing robustness of current scoring functions. 
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Charifson et al. (163) provided a comprehen- 
sive consensus scoring study using DOCK and 
Gambler, in combination with 1 3 scoring func- 
tions: LUDI (104), ChemScore (190, 191), 
Score (192), PLP (193), Merck force field 
(194), DOCK energy score (146, 147), DOCK 
chemical score. Flog (152), strain energy, Pois- 
son Boltzmann (195), buried lipophilic surface 
area (196), DOCK contact score (144), and vol- 
ume overlap (197). Three enzymes were used 
as test proteins: p38 MAP kinase, inosine 
monophosphate dehydrogenase, and HIV pro- 
tease. By comparing the performance of sin- 
gle-scoring functions with consensus scoring 
schemes involving two or three scoring func- 
tions, the authors found that false positives 
(inactive compounds that have high predicted 
scores) were significantly reduced in the latter 
case. The authors estimated that a consensus 
scoring approach would consistently provide 
hit rates between 5 and 10% (5-10 out of 100 
compounds tested to show low fxM activity) for 
enzymes with reasonably buried binding sites. 
A comparison of the different scoring func- 
tions revealed that ChemScore, PLP, and 
DOCK energy score performed best as single- 



scoring functions and also in consensus com- 
bination. Consensus scoring experiments re- 
ported by Bissantz et al. found that docking/ 
consensus scoring performances varied widely 
among targets (198). Stahl and Rarey sug- 
gested that the combinations of FlexX and 
PLP scores are ideal for consensus scoring for 
a variety of targets including COX-2, ER, p38 
MAP kinase, gyrase, thrombin, gelatinase A, 
and neuraminidase (199). 

2.4.4 Docking as Virtual Screening Tool. A 

virtual screening protocol is schematically 
shown in Fig. 6.20. The necessary steps in- 
clude: protein structure preparation, ligand 
database preparation, docking calculation, 
and postprocessing. 

The protein has to be prepared only once 
for a virtual screening experiment unless dif- 
ferent protein conformations are considered. 
The receptor site needs to be determined and 
charges have to be assigned. The protein 
structure and the receptor site have to be mod- 
eled as accurately as possible. Determining 
protein surface atoms and site points as well 
as the assignment of interaction data, such as 



Figure 6.20. Flowchart of 
docking as virtual screen- 
ing tool in the example of 
FlexX. 
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Figure 6.21. Virtual screening filter 
cascade. 



marking hydrogen-bond donors/acceptors, 
and so forth, are sometimes internally in- 
cluded in the docking software (e.g., in FlexX) 
and sometimes done separately (e.g., DOCK). 

Because of the large number of molecules, 
manual steps in the preparation of ligand da- 
tabases obviously have to be avoided. Starting 
typically from 2D structures, bond types have 
to be checked, protonation states must be de- 
termined, charges must be assigned, and sol- 
vent molecules removed. 3D coordinates can 
be generated using a program such as CON- 
CORD or CORINA (74) (see Section 2.3.2). 
Next, site points for hydrogen-bonding inter- 
actions have to be assigned and rotational bar- 
riers must be calculated. These tasks are 
sometimes included in the docking program 
(e.g., FlexX). 

The docking calculation is typically done 
for one ligand at a time. Depending on optimi- 
zation and sampling parameters as weU as on 
the flexibility of the compound, typically be- 
tween a few seconds and a few minutes of CPU 
time are needed to dock a ligand. Because the 
individual docking events are independent of 
each other, they can run on parallel hardware. 
Task schedulers that distribute ligand dock- 
ing on available CPUs are used in many dock- 
ing programs. 

Postprocessing steps of hits may include re- 
finement of placement using MD techniques, 
specific pharmacophore-based filters that pe- 
nalize certain features, such as unformed hy- 



drogen bonds or other constraints that were 
not met in the primary scoring function. Be- 
cause of the limitations of scoring functions, a 
postscoring protocol can be used to reach con- 
sensus about hits (discussed above). The rec- 
ognition of known active ligands mixed within 
the database can be used to find an appropri- 
ate threshold for separating the top-ranking 
compounds from the rest of the database. 

2.5 Filter Cascade 

Virtual screening is the process of reducing a 
given database as quickly and efficiently as 
possible to a small number of putative lead 
compounds for a given drug discovery project. 
The techniques described above form a cas- 
cade of different filter functions that are or- 
dered by their speed. Fast ADMET filters are 
followed by 2D and 3D pharmacophore filters 
and finally by docking and scoring methods. 
Figure 6.21 shows a scheme of a possible vir- 
tual screening filter cascade. 

3 APPLICATIONS 

3.1 Identification of Novel DAT Inhibitors 
through 3 D Pharmacophore-Based 
Database Search 

The dopamine transporter (DAT) is a 12- 
transmembrane helix protein that plays a crit- 
ical role in terminating dopamine neurotrans- 
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mission by taking up dopamine released into 
the synapse. There is no experimental struc- 
ture available for DAT. However, an extensive 
SAR of DAT inhibitors (mostly cocaine ana- 
logs) is available. DAT is involved in several 
diseases such as drug addiction and attention 
deficit disorder (200). For example, ritalin 
[(±)-threo-methylphenidate], a DAT inhibi- 
tor, is marketed for treating attention deficit 
disorders in children (200, 201). Until re- 
cently, aU efforts in synthesizing DAT inhibi- 
tors were focused on creating analogs around 
the tropane, piperazine, methylphenidate, 
and 2,3-dihydro-5-hydroxy-5H-imidazo[2, 1- 
ojisoindole cores. It was shown that, despite 
structural differences, DAT inhibitors share 
one or more common 3D pharmacophore mod- 
els (95, 202, 203). In an effort to identify new 
chemical cores for developing DAT inhibitors 
with new pharmacological profiles, a pharma- 
cophore-based 3D database search was pro- 
posed (95). For this purpose a pharmacophore 



model was derived based on two known potent 
DAT inhibitors R-cocaine and WIN-35065-2 
(Fig. 6.22) (95). The common binding ele- 
ments of these compounds are a ring N that 
may be substituted, a carbonyl oxygen, and an 
aromatic ring that can be defined by the posi- 
tion of its center (Fig. 6.22). Because both 
compounds have some flexibility, a systematic 
conformational search was performed to ob- 
tain all possible conformations these con 
pounds can have when bound to DAT. To 
identify structurally diverse conformers, clus- 
tering of the generated conformers was don< 
Measuring distances among chosen pharmf 
cophore elements in the generated conformei 
led to distances shown in Fig. 6.22. 

Recently, analysis of several large chemiciil 
databases showed that the NCI database has 
by far the highest number of unique com- 
pounds (204). Thus this database provides a 
large number of unique synthetic compounds 
and natural products and is an excellent re- 



Figure 6.22. Pharmacophore pro- 
posed for identifying DAT inhibi- 
tors. The pharmacophore was ob- 
tained based on two known DAT 
inhibitors, R-cocaine and WIN- 
35065 - 2 . Distance ranges between 
pharmacophore points were ob- 
tained through systematic search of 
all possible conformations that the 
two compounds may adopt when 
bound to DAT. 
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Figure 6.23. Flowchart showing steps used in lead 
identification using pharmacophore-based 3D data- 
base searching. 



source for drug lead discovery. Using the 3D 
pharmacophore from Fig. 6.22, the NCI 3D- 
database (67) of 206,876 "open compounds" 
was searched using the program Chem-X 
(205). The strategy used for identifying leads 
through virtual screening is shown in Fig. 
6.23. During the search each compound was 
first checked as to whether it had the pharma- 
cophore elements and second as to whether it 
had any acceptable conformation matching 
the distance requirements. Up to 3 million 
conformations were examined for each com- 
pound. A total of 4094 compounds, 2% of the 
database, were identified as "hits." This num- 
ber was further reduced using filters such as 
molecular weight, structural novelty, simplic- 
ity, diversity, and hydrogen-bond acceptor ni- 
trogen. Seventy compounds were selected for 
testing in biochemical assays. Forty-four com- 
pounds displayed more than 20% inhibition at 
10 fxM in the pH]mazindol binding assay, 
from which three compounds were chosen for 
deriving an SAR (Fig. 6.24). These results sug- 
gested that the 3D pharmacophore-based da- 
tabase search is an efficient tool for identifying 
novel DAT inhibitors. 



3.2 Discovery of Novel Matriptase Inhibitors 
through Structure-Based 3 D Database 
Screening 

Matriptase is a trypsinlike serine protease 
that was proposed to be involved in tissue re- 
modeling, cancer invasion, and metastasis 
(206). Potent and selective matriptase inhibi- 
tors not only would be useful for further elu- 
cidation of the role matriptase has in biologi- 
cal systems but also may be used for the 
treatment and/or prevention of cancers. Hepa- 
tocyte growth factor activator inhibitor 1 
(HAI-1) is a natural inhibitor of matriptase. 
Thus, by analyzing interactions in the com- 
plex of matriptase with HAI-1, crucial interac- 
tions that an inhibitor should capture can be 
identified. In consequence, the strategy for 
identifying inhibitors was to first build the 
matriptase-HAI-1 Kunitz domain 1 complex, 
identify binding regions on matriptase, screen 
the NCI 3D database for hits that capture 
binding groups of HAI-1 to matriptase, and in 
the end, biochemical testing (Fig. 6.25). The 
structure of matriptase was obtained from 
PDB entry lEAW (207). Homology modeling, 
as implemented in MODELLER (208, 209), 
was chosen to build the 3D structure of the 
Kunitz domain 1 from KSPI. The complex of 
matriptase with HAI-1 Kunitz domain 1 was 
built using a combination of manual docking 
and molecular dynamics refinement with the 
program CHARMM (2 10). The obtained bind- 
ing mode of HAI-1 Kunitz domain 1 to 
matriptase (Lig. 6.26) suggests that three re- 
gions might be important for inhibitor bind- 
ing. The SI binding site Aspl85, which is char- 
acteristic of tiypsinlike serine proteases, is the 
specificity pocket used to recognize substrates 
with Arg or Lys as PI residue. The anionic 
site, defined by Asp96, AspGO.A, and AspGO.B, 
is the site at which Arg258 from HAI-1 binds. 
A hydrophobic region defined by Ile41 and 
TyrGO.G might also be important for specific- 
ity of future matriptase inhibitors. 

Thus, the active site used for in silico 
.screening with the program DOCK consti- 
tutes all three binding regions. Energy scoring 
was used for ranking docked compounds. The 
top 2000 compounds were considered for se- 
lecting potential inhibitors. Given that 
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matriptase prefers positively charged residues 
in the PI position, inhibitors should also have 
positively charged groups to bind efficiently to 
Asp 185 from the SI site of matriptase (Fig. 
6.27). Note that a more efficient way of doing 
the virtual screening presented above is to do 
a pharmacophore search first followed by 
docking. Thus, 69 compounds were selected 
for biochemical testing at 75 jllM inhibitor and 
matriptase concentration. Initial screening 
showed that 50 % of compounds tested pro- 
duced more than 70 % inhibition of enzymatic 
activity when the ratio was one inhibitor mol- 



ecule for one protein molecule (Table 6.4). It 
should be noted that screening results at sin- 
gle dose and IC„ depend on the protein con- 
centration, whereas K-^ is concentration inde- 
pendent. From the hits in the screening step 
bis-benzamidines were chosen for K-^ determi- 
nation (Table 6.5) because this class of com- 
pounds could bind to both the SI site and the 
anionic site. These results show that combin- 
ing a pharmacophore hypothesis with a struc- 
ture-based database search can provide an ef- 
ficient way of identifying leads for a drug 
design project. 
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Figure 5,5, GRID probes on 
Factor Xa site and the com- 
bined resultant complementary 
site points that can be used for 
pharmacophore fingerprint cal- 
culations (lower right). 
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Figure 5,6, Overview of the Gridding and Partitioning (GaP) procedure as applied to monomers, 
exemplified using phenylalanine as a potential primary amine. This molecule thus contains two 
pharmacophoric groups (the aromatic ring and the carboxylic acid). During the conformational 
analysis the locations of these pharmacophoric groups are tracked within a regular grid. 
[Reproduced from A, R. Leach and M. M. Hann, Drug Discovery Today, 5, 326-336 (2000), with 
permission of Elsevier Science.] 
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Figure 5.13. (a) Avirtual library of 634,72 1 allowed combinatorial AB products (after filtering out 
products that failed Lipinski's Rule of 5 “druglike” criteria) shown in a BCUT chemistry space 
specifically chosen to best represent the diversity of the virtual library, (b) The maximally diverse 
9600-compound subset cf the virtual library, illustrating the results of purely product-based 
"library design." Although providing the maximal diversity, synthesis of these 9600 AB products 
would require the use of 347 Ak and 1024 B's— clearly unacceptable from the perspective of syn- 
thetic economy (number of reactants and robotic control), (c) The 9600-compound library resulting 
from the traditional, purely reactant-based library design strategy cf selecting the 80 most diverse 
A^ and the 120 most diverse B’s. Although providing user-selected synthetic economy, the diversi- 
ty of these 9600 AB products is clearly quite poor, (d) The 9600-compound library resulting from 
the reactant-biased, product-based (RBPB) algorithm developed by Pearlman and Smith (see Refs. 
31, 87c and text). The algorithm selected a different set of 80 A^ and a different set of 120 B's, thus 
providing the same level of user-selected synthetic economy, while also providing substantially 
greater diversity than could be achieved using a purely reactant-based library design strategy. 



Figure 5.25. The 3D subspace most recep- 
tor relevant for members of the GPCR-PA-f 
family cf receptors. Points indicate coordi- 
nates of 187 published ligands of various 
GPCR-PAh- receptors. Some have been 
color-coded by receptor for illustrative pur- 
poses. See Refs. 32e,i and text for further 
details. 
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Figure 5.26. The same 3D 
subspace as in Fig. 5.25, 
rotated slightly to provide a 
better viewing perspective. 
Points indicate coordinates of 
about 2000 combinatorial 
products selected from 1 4 dif- 
ferent libraries. Color-coding 
indicates affinity for the 
GPCR-1 receptor. See Refs. 
32e,i and text for further 
details. 




Figure 10.2. View of (4) (2,3-DPG) 
binding site at the mouth of the p-cleft 
of deoxy hemoglobin. 






Figure 103. Stereoview cf allosteric binding site 
in deoxy hemoglobin. A similar compound environ- 
ment is observed at the symmetry-related site, not 
shown here, (a) Overlap cf four right-shifting 
allosteric effectors of hemoglobin: (6a) (RSR13, 
yellow), (6b) (RSR56, black), (7a) (MM30, red), 
and (7b) (MM25, cyan). The four effectors bind at 
the same site in deoxy henoglobin. The stronger 
acting RSR compounds differ from the much weak- 
er MM compounds by reversal of the amide bond 
located between the two phenyl rings. As a result, 
in both RSR13 and RSR56, the carbonyl oxygen 
faces and makes a key hydrogen bonding interac- 
tion with the amine of aLys99. In contrast, the car- 
bonyl oxygen of the MM compounds is oriented 
away from aLys99 amine. The aLys99 interaction 
with the RSR compounds appear to be critical in 
the allosteric differences, (b) Detailed interactions 
between RSR13 (6a) and hemoglobin, showing key 
hydrogen bonding interactions that help constrain 
the T-state and explain the allosteric nature cf the 
compound and those of other related compounds. 




Figure 10.4. Stereoview of superimposed binding 
sites for (8b) (5-FSA, yellow) and (8a) (DMHB, 
magenta) in deoxy hemoglobin. A similar com- 
pound enviroment is observed at the symmetry- 
related site and therefore not shown here. Both 
compounds form a Schiff base adduct with the 
alVall N-terminal nitrogen. Whereas the m-car- 
boxylate of 5-FSA forms a salt bridge with the 
a2Argl41 (opposite subunit), this intersubunit 
bond is missing in DMHB. The added constraint to 
the T-state by 5-FSA that ties two subunits togeth- 
er shifts the allosteric equilibrium to the right. On 
the other hand, the binding cf DMHB does not add 
to the T-state constraint. Instead, it disrupts any 
T-state salt- or water-bridge interactions between 
the opposite a- subunits. The result is a left shift cf 
the oxygen equilibrium curve by DMHB. 
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Figure 10.6. Stereoview of the binding site for (9) 
in = 3, TB36, yellow) in deoxy Hb, A similar com- 
pound environment is observed at the symmetry- 
related site, not shown here. One aldehyde is cova- 
lently attached to the N-terminal alVall, whereas 
the second aldehyde is bound to the opposite sub- 
unit, a2Lys99 ammonium ion. The carboxylate on 
the fir st aromatic ring forms a bidentate hydrogen 
bond and salt bridge with the guanidinium ion of 
a2Argl41 of the opposite subunit. The effector thus 
ties two subunits together and adds additional con- 
straints to the T-state, resulting in a shift in the Hb 
allosteric equilbrimn to the right. The magnitude of 
constraint placed on the T-state by the crosslinked 
aLys99 varies with the flexibility of the linker. 
Shorter bridging chains form tighter crosslinks and 
yeild larger shifts in the allosteric equilibrium. 
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Figure 10.6. Binding 
site for (10) {N10~ 
propynyl-5,8- 
dideazafolate), within 
the active site of 
thymidylate synthase 
from Escherichia coh. 
The surface cf the 
inhibitor is shown in 
the left view. The red 
spheres in the left 
view are tightly bound 
water molecules. 




Figure 10.9. (b) Active site with bound (31) 
[saquinavir (PDB code IHXB)]. Note the asym- 
metry of inhibitor binding. The flap water that 
is shown very close to saquinavir is labeled W. 





Figure 10.10. Comparison cf the structures of HIV- 
P apoenzjrme monomer (top, PDB code 3PHV) and 
the wmplex between HN-P and (32) (U-85548; bot- 
tom, PDB wde 8HVP), The inhibitor is shown as a 
ball and stick structure. Note the rearrangement of 
the flap residues; IleSO is indicated for reference. 
The van der Waals surface cf Asp25 is shown in both 
structures. The flap water (red ball) is also shown 
between Ile50 and U-85548. In the bottom struc- 
ture, the locations of the N and C termini of HN-P 
are noted. 








(b) 



Figure 10.11. Orthogonal views of 
the complex between HIV-P and (32) 
(U-85548). The view in panel a is 
rotated approximately 90° (around 
the long axis of the protein) from the 
view in panel b. Van der Waals sur- 
faces cf Asp25, Asp25', and the flap 
water (W) are shown. In panel b, the 
solvent-accessible surface cf the 
inhibitor is shown. 




Figure 10.17. Structure of 
rhino virus capsid protein 
VPl showing the bound con- 
formation of antiviral isoxa- 
zole compounds (78) [dis- 
oxaril, WIN-51711: panel a, 
top], (79) [WIN-54954: panel 
b, middle], and (80) [ple- 
conaril, WIN-63843: panel c, 
bottom]. The PDB codes for 
the X-ray structural model 
coordinates used to create 
these views are: IPIV (for 
78), 2HWE (for 79), and 
1C8M (for 80). On the left 
side of each panel, the 
inhibitors are shown as van 
der Waals surfaces, and the 
protein as a ribbon diagram. 
On the right side, the struc- 
tures of the inhibitor alone 
are shown, from the same 
view, as ball and stick repre- 
sentations. 
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Figure 10.18. Binding of SB2Q358Q (shown, ns n hnll nnd s(\ok vn s\\fe 

MAPK p38a. In addition to the side chains of the labeled residues, the protein backbone between 
Leul04 and Metl09 is shown, as well as several aliphatic side chains and a water molecule (red 
sphere). Hydrogen bonds (dotted lines) are shown between the backbone amide of Metl09 and the 
inhibitor's pyrimidinyl nitrogen, and between the c-amino of Lys53 and the inhibitor's imidazole 
N3. This figure is based on the PDB coordinate set 1A9U (187). 




Figure 11.6. Tliree density maps at differing T'csolutions: a, 1.3 A: h, 2. 1 A: c. 3,0 A 




Figure H7. (b) Structure of the LuxS 

monomer highlighting the bound zin c ion 
(magenta) and methionine (green). 










Figure 14.6. Examples cf macromolecules studied by cryo-EM and 3D image reconstruction and the 
resulting 3D structures (bottom row) after cryo-EM analysis, jm micrographs (top row) are displayed at 
above 170,000X magnification and dl models at about 1,200,000 x magnification, (a) A single particle 
without symmetry: The micrograph shows 70S R coh ribosomes complexed with mRNA and fMet- 
tENA. The surface- shaded density map, made by averaging 73,000 ribosome images from 287 micro- 
graphs has a resolution (FSC) of 1 15 & The SOS and SOS subunits and the tRNA are colored blue, yel- 
low, and green, respectively. The identity of many of the subunits is known as some RNA double helices 
are clearly recognizable by their major and minor grooves (e.g., hehx 44 is shown in red). [Courtesy cf 
J. Frank (SUNY, Albany), using data from Gabashvili et al. (86).] (b) A single particle with symmetry: 
The micrograph shows hepatitis B virus cores. The 3D reconstruction, at a resolution of 7.4 A (DPR), 
was computed from 6384 particle images taken from 34 micrographs. [From Bottcher et.al. (44).] (c) A 
helical filament: The micrograph shows actin filaments decorated with myosin SI heads containing the 
essential light chain. The 3D reconstruction, at a resolution cf 30-35 A, is a composite in which the dif- 
ferently colored parts are derived from a series of difference maps that were superimposed on f-actin. 
The components include: f-actin (blue), myosin heavy chain motor domain (orange), essential light chain 
(purple), regulatory light chain (white), tropomyosin (green), and myosin motor domain iV-tenninar 
beta-barrel (red). [Courtesy cf A Lin, M Whittaker, and R. Milligan (Scripps Research Institute, 
LaJolla, CA).l (d) A2D crystal, light-harvesting complex LHCII at 3.4-A resolution. The model shows 
the protein backbone and the arrangement cf chromophores in a number cf trimeric subunits in the 
crystal lattice. In this example, image contrast is too low to see any hint of the structure without image 
processing (see also Fig. 14.3). [Courtesy of W. Kiihlbrandt (Max-Planck- Institute for Biophysics, 
Frankfurt, Germany).] 





Figure 15.35. GRAB peptidomimetics in action. 





Figure 5.26. The same 3D 
subspace as in Fig. 5.25, 
rotated slightly to provide a 
better viewing perspective. 
Points indicate coordinates of 
about 2000 combinatorial 
products selected from 1 4 dif- 
ferent libraries. Color-coding 
indicates affinity for the 
GPCR-1 receptor. See Refs. 
32e,i and text for further 
details. 




-subunit 



1 -subunit 



«2-subu 



-subunit 



Figure 10.2. View of (4) (2,3-DPG) 
binding site at the mouth of the (3-cleft 
of deoxy hemoglobin. 
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Figure 6.27. Potential functional groups 
in inhibitors necessary to block the speci- 
ficity pocket (SI binding site) of matriptase. 

The choice reflects the observation that 
matriptase prefers substrates with Lys or 
Arg as PI residue. 

ity needs to be included in high-throughput 
docking. Scoring functions have to improve to 
make consistently correct predictions of puta- 
tive protein-ligand binding affinities. Scoring 
functions, calibrated to reproduce experimen- 
tal data, have unreliable performance outside 
their training set. Thus, de novo methods us- 
ing terms describing the thermodynamics of 
binding should replace the first generation of 
scoring functions. In consequence, some of the 
speed gained from low cost parallel computing 
should be invested into higher accuracy scor- 
ing rather than higher throughput. One way 
of increasing throughput is to keep the num- 
ber of compounds docked as small as possible 
by using every bit of knowledge one has to 
prefilter the database, mainly based on phar- 
macophore information. In some cases this is 



Table 6.4 Results Obtained from the Initial 
Screening of Compounds against Matriptase 
(206)“ 



% Inhibition 


Number of Compounds 


Over 95% 


15 


90-94% 


4 


70-89% 


15 


40-69% 


13 


Below 39% 


17 


High absorbency 


3 


Increased activity 


3 



"Testing was done at 75 fdtf eompoundand protein eon- 
eentration. The ratio between eompound and protein molar 
eoneentration was 1:1. 



easy. For instance the necessity of having cer- 
tain features like salt bridges formed on ligand 
binding [e.g., in influenza virus neuramini- 
dase (211)1 or other prevalent information 
(e.g., hinge region binding for many ATP com- 
petitive kinase inhibitors) greatly helps to re- 
duce the number of compounds subjected to 
docking experiments. 

The missing robustness of many structure- 
based docking/scoring techniques opens the 
questions of when should one apply it and 
when should one retreat to pharmacophore- 
based virtual screening. In many cases it 
makes sense to prescreen virtual libraries us- 
ing pharmacophore techniques, particularly if 
one uses shape representations of the receptor 
site, such as volume-exclusion spheres, a 
pharmacophore search can be a very effective 
prefilter. Also, in cases where receptor-site 
flexibility is problematic, pharmacophore 
searching may be less restrictive (unless one 
tries to deal with protein flexibility in the 
docking routine — a task that is not easy, usu- 
ally not applied today, and another future di- 
rection of development in virtual screening). 

The above tools and pathways show a sim- 
ple and inexpensive way of discovering novel 
lead chemical matter for drug discovery pro- 
grams. However, there are many hurdles to 
overcome to make virtual screening success- 
ful. The properties of druglikeness may not be 
understood sufficiently enough, resulting in 
poor pharmacokinetics of the compounds: ex- 
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Table 6.5 Values Obtained for Tested bis-Benzamidines against Matriptase 



Compound Structure 




isting SARs that lead to the generation of 
pharmacophore models may bias the pharma- 
cophore toward a narrow segment of com- 
pounds; structural information of the target 
protein is often not available; and homology 
modds may not be precise enough. Current 
scoring functions are often not robust enough 
to separate actives from inactives . Compounds 
identified may not be easy to synthesize. Hits 
may not be selective or patentable. 



On one hand, there are obviously many 
risks involved in virtual screening, many as- 
sumptions made, and a positive outcome not 
at all guaranteed in each and every case. On 
the other hand, however, the overall process is 
extremely cost effective and fast. Even if suc- 
cessful in only a few cases, virtual screening 
can produce leads that may otherwise not 
have surfaced and so add immense value to a 
drug discovery program. Especially in cases 
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where high throughput screening cannot 
identify a viable lead chemical matter, virtual 
screening applied to vendor databases or combi- 
natorial libraries to be synthesized presents a 
cost-effective alternative. Mainly because of its 
speed, cost effectiveness, ease of setup, and in- 
creasingrobustness, we expect virtual screening 
to become a mainstream approach throughout 
the pharmaceuticalindustry. 
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1 INTRODUCTION 

The action of drug molecules and the function 
of protein targets are governed by principles of 
molecular recognition. Binding events be¬ 
tween ligands and their receptors in biological 
systems form the basis of physiological activ¬ 
ity and pharmacological effects of chemical 
compounds. Accordingly, the rational develop¬ 
ment of new drugs requires an understanding 
of molecular recognition in terms of both 
structure and energetics (l).With respect to 
practical applications, it requires tools that 
are based on such knowledge of mutual recog¬ 
nition between molecular structures. Docking 
and virtual screening are computational tools 
to investigate the binding between macromo- 
lecular targets and potential ligands. They 
constitute an essential part of structure-based 
drug design, the area of medicinal chemistry 
that harnesses structural information for the 
purpose of drug discovery. 

Structure-based design has become an in¬ 
tegral part of medicinal chemistry. Although 
the knowledge about molecular recognition 
and its foundation on structural principles is 
still far from being complete, it has already 
fueled significant advances and contributed to 
many success stories in drug discovery (2). 
Convincing evidence has been accumulated 
for a large number of targets that the protein 
three-dimensional (3D) structure can be used 
to design small molecule ligands binding 
tightly to the protein. Several marketed com¬ 
pounds can indeed be attributed to successful 
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structure-based design (3-6), as summarized 
in a number of recent reviews (2, 7-9). 

Structure-based drug design is an iterative 
process (2,10). It requires as the starting point 
the crystal structure or a reliable homology 
model of the target protein, preferentially 
complexed with a ligand. The first step of the 
process is a detailed analysis of the binding 
site and a compilation of all aspects possibly 
responsible for binding affinity and selectivity. 
These data are then used to generate new 
ideas how to improve existing ligands or to 
develop alternative molecular frameworks. 
Computational methods and molecular mod¬ 
eling play an essential role in this phase cf 
hypothesis generation. They help to exploit in¬ 
formation about the binding site geometry by 
constructing new molecules de novo, by ana¬ 
lyzing known molecules with respect to their 
affinity and binding geometry, or by searching 
compound libraries for potential hits to sug¬ 
gest new leads. Discovered hits that are com¬ 
mercially available or synthetically accessible 
are then experimentally tested and their bind¬ 
ing properties examined by biochemical, crys¬ 
tallographic, and spectroscopic methods. The 
3D structure of new complexes together with 
the acquired activity data are subsequently 
used to start a new cycle of ligand design to 
improve the hypotheses stated in the previous 
round. 

Since the introduction of computational 
structure-based design techniques into the 
drug discovery process in the early 1980s, the 
impact of these methods has significantly 
changed. Initially, computational tools were 
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often applied a posteriori to rationalize and 
understand the binding and structure-activity 
relationships in a series of inhibitors and to 
assist in the manual design of individual com¬ 
pounds. Guided by the creativity of the de¬ 
signer, a novel putative ligand was con¬ 
structed using computer graphics. Molecular 
mechanics calculations were performed on the 
produced protein-ligand complex to assess the 
properties of the generated ligand in terms of 
a geometric and energetics analysis. A ligand 
was assumed to bind with high affinity if sat¬ 
isfactory complementarity in shape and sur¬ 
face properties between the protein and the 
ligand could be detected. 

It has been realized, however, that the de¬ 
sign of a single, synthetically accessible, active 
compound is a larger challenge than antici¬ 
pated. Many phenomena of molecular recogni¬ 
tion are not yet fully understood, nor are cur¬ 
rent modeling tools able to reflect and 
accordingly predict them with sufficient reli¬ 
ability. Most important, a fast and accurate 
computational prediction of binding affinities 
for new inhibitor candidates is still difficult to 
obtain. Although the existing tools do cer¬ 
tainly not allow the medicinal chemist to de¬ 
sign the one perfect ligand, they can help to 
enrich sets of molecules with more active ones, 
even though the known deficiencies of the 
methods can still lead to significant rates of 
both false positives and false negatives. A 
mete moderate goal of current molecular de¬ 
sign is thus to improve the hit rates of mole¬ 
cules suggested for biological assaying com¬ 
pared to a mere random compound selection 
or testing of nontargeted compound libraries. 
This implies that structure-based design ap¬ 
proaches now focus on the processing of large 
numbers of molecules, arranged in so-called 
virtual libraries. These can be composed of ei¬ 
ther existing chemical substances (such as, for 
example, compound collections of a pharma¬ 
ceutical company) or hypothetical new mole¬ 
cules that could be synthesized by combinato¬ 
rial chemistry. The task is then to filter these 
large libraries by eliminating the majority of 
molecules that is rather unlikely to bind and 
by prioritizing the remaining ones. As recent 
experience shows, this strategy can be success¬ 


ful: several publications have reported quite 
impressive enrichments of active compounds 
(11-16). 

The change of focus from single molecule to 
compound library design in modern structure- 
based drug discovery is also a consequence of 
major technological advances that have dra¬ 
matically enhanced the data throughput in a 
variety of fields: 

1. Progress in gene technology, protein chem¬ 
istry, and structure determination tech¬ 
niques have resulted in a tremendous 
increase in protein structure informa¬ 
tion. The number of publicly available 
3D protein structures continues to grow 
exponentially, with further acceleration 
expected from the current initiatives of 
structural genomics (17). As a conse¬ 
quence, more and more design projects 
are based on structural information, and 
structure-based ligand design has be¬ 
come routine at all major pharmaceutical 
companies. On the other hand, the grow¬ 
ing amount of structural knowledge also 
calls for automated methods that make 
this new wealth of data accessible and 
available. 

2. Automation and miniaturization have led 
to the development of high-throughput 
screening (HTS) , which is now a well-es¬ 
tablished process for large-scale biological 
testing. Libraries of several hundred thou¬ 
sand compounds are routinely screened 
against new targets, frequently on a time 
scale of less than 1 month. 

3. The characteristics of synthetic chemistry 
have significantly changed with the intro¬ 
duction of combinatorial and parallel 
chemistry techniques. The trend contin¬ 
ues to move away from the synthesis of in¬ 
dividual compounds toward the generation 
of compound libraries, whose members 
are accessible through the same type of 
chemical reaction but different building 
reagents. 

4. Massive data processing and computa¬ 
tional tasks formerly requiring expensive 
supercomputers have become generally 
feasible by advances in PC cluster com- 
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puting, offering supercomputing power 

at unprecedented performance-to-price 

ratios. 

To provide competitive advantage, structure- 
based design tools must be fast enough to process 
thousands cf compounds per day using affordable 
computing resources. Several algorithms have 
been developed that allow virtual ligand screening 
based on high-throughput flexible docking (18, 
19). More sophisticated methods may then be ap¬ 
plied at a later stage of refinement, where a 
smaller set of compounds, containing the most 
promising hits, is subjected to a more detailed 
analysis. Essential elements of all these docking 
tools are scoringfunctions that translate computa¬ 
tionally generated protein-ligand binding geome¬ 
tries into estimates of affinity. 

Docking as a computational tool of struc¬ 
ture-based drug design to predict protein- 
ligand interaction geometries and binding af¬ 
finities is the subject of this chapter. Funda¬ 
mental aspects of the docking process, the 
scoring of protein-ligand interactions, and the 
application of docking to virtual ligand screen¬ 
ing are discussed. At first, a discussion of the 
underlying physical principles determining 
protein-ligand recognition is given (Section 

2.1) , followed by a description of the general 
concepts of docking, scoring, and virtual 
screening (Section 2.2). Subsequently, the 
current approaches to the docking problem 
are presented (Section 3.1), focusing on the 
search methods (Section 3.1.3) and the ap¬ 
proaches used to represent protein and ligand 
structures in an efficient way (Sections 3.1,1- 
3.1.2). In addition, a number of special aspects 
is discussed (Section 3.2), including, for exam¬ 
ple, the issues of protein flexibility (Section 

3.2.1) or the consideration of water molecules 
in the context of docking (Section 3.2.2). This 
is followed by a section on scoring functions 
used for docking. Three major classes of scor¬ 
ing functions are presented (Section 4.1) and 
subjected to critical assessment (Section 4.2). 
A final section is dedicated to virtual screen¬ 
ing, illustrating general strategies (Section 5), 
special problems (Sections 5.1-5.5), and repre¬ 
sentative applications (Section5.6). 

Although the goal of this chapter is to high¬ 
light the most important aspects of docking in 
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the context of structure-based drug design, a 
comprehensive discussion of all aspects of cur¬ 
rent docking methodologies would be beyond 
its scope. Protein-protein or protein-DNA 
docking (20-24), as well as the docking of 
small ligands to DNA or RNA as targets (25), 
are not explicitly covered. The focus will be on 
the docking of small molecules to protein bind¬ 
ing sites, with emphasis on automated proce¬ 
dures. Interactive docking tools, as provided 
by some modeling software packages, or frag¬ 
ment-based de novo design methods are not 
discussed. Information about the latter is 
available through other reviews (26, 27) (see 
also Ref. 28 for a list of de novo design algo¬ 
rithms). Virtual screening is discussed only in 
the context of docking; database screening 
techniques based on molecular similarity or 
pharmacophore models are not considered 
(29, 30). As a final introductory remark, it 
should be emphasized that docking methods 
are actually tools for "ligand" (rather than 
"drug") design. The identification of a tight- 
binding ligand is a necessary but not sufficient 
criterion toward a promising novel lead struc¬ 
ture and its development into a drug. Aspects 
of synthetic accessibility, bioavailability, or 
toxicity are not the primary subject of docking, 
but because it is important to consider these 
factors at early stages of a design project, fil¬ 
ters may be applied to compound libraries be¬ 
fore docking or to hits obtained from virtual 
screening on an early stage. This aspect of pre- 
or postprocessing is discussed only briefly. 

2 GENERAL CONCEPTS AND PHYSICAL 
BACKGROUND 

2.1 Protein-Ligand Interactions and the 
Physical Basis of Biomolecular Recognition 

The selective binding of a small-molecule li¬ 
gand to a specific protein is determined by 
structural and energetic factors. For ligands of 
pharmaceutical interest, protein-ligand bind¬ 
ing usually occurs through noncovalent inter¬ 
actions. The physical basis of noncovalent 
interactions is generally well established 
through the theories of electromagnetic forces 
or, on a more fundamental level, of quantum 
mechanics. For macromolecules, liquid sys¬ 
tems, or solutions, however, direct application 
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of these first principles is significantly compli¬ 
cated by the size and complexity of the sys¬ 
tems, in which a large number of fluctuating 
particles simultaneously interact and influ¬ 
ence each other. Principles from classical me¬ 
chanics and heuristic models are therefore fre¬ 
quently used as an approximation to describe 
protein-ligand interactions in aqueous solu- 


The primary forces acting between a pro¬ 
tein and a ligand are all of electrostatic nature. 
It is the interaction between explicit charges, 
dipoles, induced dipoles, and higher electric 
multipoles that leads to phenomena that are 
commonly referred to as salt bridges, hydro¬ 
gen bonds, or van der Waals interactions. In 
simplified classifications, it is only the charge- 
charge interaction that is called electrostatic. 
This interaction between two charges is of 
long range and considerable strength. In vac¬ 
uum or uniform media it can be described by 
Coulomb's law. In aqueous solution of biomol¬ 
ecules, however, its application is complicated 
because of the presence of a large number of 
water molecules. Unless a sufficiently large 
number of water molecules is explicitly in¬ 
cluded in the calculations [as usually only 
tractable in computationally expensive molec¬ 
ular dynamics simulations (31)], the correct 
treatment of electrostatic interactions in solu¬ 
tion requires solving the Poisson-Boltzmann 
equation, where the solvent is considered as a 
continuous medium of high dielectric constant 
surrounding a low-dielectric solute (32). 

Electrostatic interactions, however, do not 
cnly occur between charge monopoles. In a 
comprehensivetreatment of electrostatics one 
has to consider a full power series, and there- 
fae interactions between higher electric mo¬ 
ments, such as dipoles and quadrupoles, also 
play an essential role. Their interaction ener¬ 
gies are orientation dependent and become 
shorter in range with increasing electric mo¬ 
ment. For example, in contrast to the 1/r de- 
, pendency in Coulomb's law, the energy of the 
interaction between a charge and a dipole de¬ 
cays with 1/r 2 , the interaction between two 
dipoles with 1/r 3 . This, however, is valid only 
for a fixed orientation of the dipoles. If they 
are mobile, as in isotropic media (liquids), the 
dipole-dipole interaction is thermally aver¬ 
aged and an average interaction proportional 


to 1/r 6 results. This 1/r 6 dependency is also 
encountered in interactions that arise be¬ 
tween induced electric moments, such as the 
dispersion interaction based on London 
forces. The attractive interactions between 
(induced) electric multipoles are generally 
summarized in the term van der Waals inter¬ 
actions. Accordingly, van der Waals forces are 
weak, attractive, short-range forces that decay 
with 1/r 6 . These are normally described by in- 
termolecular interaction potentials such as 
the Lennard-Jones potential: 

E oc Air 12 - B/r 6 

where A and B are parameters depending on 
the type of the interacting atoms. The r 12 
term reflects the short-range repulsive 
forces attributed to unfavorable spatial 
overlap of electron clouds at short distances. 

An interaction deserving special attention 
is that of hydrogen bonds (33,34). In principle, 
their origin is of the same nature as the inter¬ 
actions mentioned above. A hydrogen bond is 
defined as the interaction of an electronega¬ 
tive atom (the hydrogen-bond acceptor) with a 
hydrogen atom covalently bonded to an elec¬ 
tronegative atom (the hydrogen-bond donor). 
The major component of a hydrogen bond is 
the electrostatic interaction of the donor-hy¬ 
drogen dipole with the negative partial charge 
of the acceptor. The special characteristics 
originate from the fact that the hydrogen 
atom is very small and can bear a considerable 
positive partial charge, such that the acceptor 
can contact the hydrogen atom at a shorter 
distance than expected from the van der Waals 
radii. Hydrogen bonds are directed interac¬ 
tions showing a high angular dependency. 
This directionality arises from the anisotropic 
charge distribution around the acceptor atom 
(lone pairs) and the fact that the electron 
shells of donor and acceptor atom start to 
overlap at these short distances unless the 
ideal geometry is maintained. Hydrogen 
bonds are attributed an important role with 
respect to specificity of the protein-ligand in¬ 
teraction. This is based on their directionality 
and the fact that they require a well-defined 
complementarity in the complex (mutual ar¬ 
rangement of hydrogen-bond donors and ac- 
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ceptors). However, the importance of hydro¬ 
gen bonds should not be overemphasized 
because it is the balance between hydrogen 
bonds and other forces in protein-ligand com¬ 
plexes that must be appropriately considered 

(35) . 

Weakly polar interactions in proteins and 
protein-ligand complexes are frequently phe¬ 
nomenologically analyzed and classified in 
terms of the interacting partners (36). This 
especially includes interactions with T-sys- 
tems, such as the NH-T, OH-T, or CH-'jr inter¬ 
action (37, 38), aromatic-aromatic interac¬ 
tions (parallel t-t stacking versus edge-to- 
face interaction), and the cation-T interaction 
(39). All of these can mostly be rationalized in 
terms of electrostatic interactions outlined 
above; that is, they involve interactions be¬ 
tween monopoles, dipoles, and quadrupoles 
(permanent and induced). A more distinct 
character can be attributed to metal complex- 
ation, which can play a significant role in indi¬ 
vidual cases of protein-ligand interactions, as 
for example in metalloenzymes (2, 40, 41). 

Finally, so-called hydrophobic or lipophilic 
interactions are often mentioned as additional 
contribution to protein-ligand interactions. 
These terms are used to describe the preferen¬ 
tial association of nonpolar groups in aqueous 
solution. It should be emphasized, however, 
that in contrast to what the name suggests, 
there is no special hydrophobic force. Instead, 
one should speak of a hydrophobic effect. As 
further mentioned below, according to the 
generally accepted view, it arises primarily 
from the entropically favorable replacement 
and release of water molecules (42, 43). The 
association between the nonpolar surfaces it¬ 
self is simply based on weak London forces 

(36) . 

Thermodynamically, the strength of the in¬ 
teraction between a protein and a ligand is 
described by the binding affinity or (Gibbs) 
free energy of binding. Assuming a simple 
equilibrium reaction of the form 

P + L PL 

between a protein P and ligand L to give the 
complex PL, the dissociation constant K d (or 
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binding constant K) is generally used to de¬ 
scribe the stability of complex formation: 

K d = [P][L]/[PL]. 

From the experimentally measured equilib¬ 
rium constant the binding affinity can be cal¬ 
culated as 

AG° = RT In K d 

where R is the gas constant (8.314 J/molK) 
and T is the temperature [the equilibrium con¬ 
stant would actually have to be related to a 
standard concentration to become a dimen¬ 
sionless quantity, but in general this is not 
explicitly considered (44, 45)]. Experimentally 
determined binding constants K x ( K d ) are typ¬ 
ically in the range of 10 -2 to 10 -12 M, corre¬ 
sponding to a Gibbs free energy of binding of 
roughly —10 to -70 kJ/mol (1, 2). 

According to the Gibbs-Helmholtz equa¬ 
tion, the free energy of binding consists of an 
enthalpic and an entropic contribution: 

AG = AH - T AS 

The enthalpy and entropy of binding can be 
determined experimentally, as, for example, 
by isothermal titration calorimetry (46,471. 
These data, however, are still sparse and not 
always easy to interpret (48, 49). Substan¬ 
tial compensation between enthalpic and en¬ 
tropic contributions is observed (50-52); 
this phenomenon and its interpretations 
have recently been critically reexamined 
(53). Interestingly, the data also show that 
binding can be both enthalpy-driven (e.g., 
streptavidin-biotin, AG = —76.5 kJ/mol, AH 
= —134 kJ/mol) or entropy-driven (e.g., 
streptavidin-HABA, AG = -22.0 kJ/mol, Aff 
= +7.1 kJ/mol) (54). However, because cf 
strong temperature dependencies, even this 
partitioning is a question of the temperature 
used for measuring. 

What are the major contributions to the en¬ 
thalpy and entropy of binding? Direct interac¬ 
tions between the protein and the ligand are 
obviously very important for the enthalpy of 
binding. Besides that, an essential factor is 
that protein-ligand interactions occur in 




Figure 7.1. Overview of the receptor-ligand binding process. All species involved are solvated by 
water (symbolized by gray spheres). The binding free energy difference between the bound and 
unbound state is a sum of enthalpic components (breaking and formation of hydrogen bonds, forma¬ 
tion of specific hydrophobic contacts) and entropic components (release of water from hydrophobic 
surfaces to solvent, loss of conformational mobility of receptor and ligand). 


aqueous solution (cf. Fig. 7.1). The unbound 
reaction partners are solvated and partial de¬ 
solvation is required before complex forma¬ 
tion can occur. This is important for the en¬ 
thalpy balance because the net energy gain 
upon complexation can only be the difference 
between the direct protein-ligand interaction 
enthalpy and the desolvation enthalpies of the 
two molecules. In this context, the hydropho¬ 
bic elfect has to be considered again. Upon the 
formation of lipophilic contacts between apo- 
lar parts of the protein and the ligand, unfa¬ 
vorably ordered water molecules are replaced 


and released. This leads to an entropy gain 
that is attributed to the fact that the water 
molecules are no longer positionally confined. 
In addition, there is an enthalpic contribution: 
water molecules occupying lipophilic binding 
sites are unable to form hydrogen bonds with 
the protein, but after release they can form 
strong hydrogen bonds with bulk water. Be¬ 
cause the removal of hydrophobic surfaces 
from contact with water leads to negative 
changes in the heat capacity (AC p ), the buried 
hydrophobic surface area has frequently been 
correlated with AC p values measured upon li- 
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gand binding. This, however, may be an over¬ 
simplification, neglecting other potential con¬ 
tributions to AC p (55). As further noted by 
Tame, enthalpy-entropy compensation and 
the temperature dependency of AH and T A S 
(which are both directly related to AC p ), make 
it ultimately impossible to consider polar or 
apolar contributions as purely enthalpic or en- 
tropic, respectively (56). 

Entropically unfavorable contributions 
arise from the loss of translational and rota¬ 
tional degrees of freedom upon complexation, 
whereas a small gain in entropy can result 
from low-frequency concerted vibrations in 
the complex. A more important factor to con¬ 
sider in an actual design process is conforma¬ 
tional flexibility. Upon binding, internal de¬ 
grees of freedom are frozen, the ligand loses a 
considerable amount of its flexibility, and usu¬ 
ally binds in one single orientation. This is 
also the explanation why rigid analogs of flex¬ 
ible ligands show higher affinity, as, for exam¬ 
ple, observed for cyclic derivatives of ligands 
that adopt the same binding mode as the open- 
chain derivative (57, 58). Accordingly, higher 
affinity also results if the protein-bound li¬ 
gand conformation is already preorganized in 
solution. 

From a variety of experiments, quantita¬ 
tive estimates for some of the mentioned en¬ 
ergetic contributions to protein-ligand bind¬ 
ing could be derived. Based on data from 
protein mutants, the contribution of individ¬ 
ual hydrogen bonds to the binding affinity has 
been estimated to be 5 ± 2.5 kJ/mol (59-62). 
This is similar to what has been obtained for 
the contribution of an intramolecular hydro¬ 
gen bond to protein stability (63, 64). The con¬ 
sistency of values derived from different pro¬ 
teins suggests some degree of additivity in the 
hydrogen-bonding interactions. The accurate 
description of the interplay with water mole¬ 
cules remains, however, a most challenging 
task. The contribution of hydrogen bonds to 
the overall affinity strongly depends on local 
solvation and desolvation effects and can 
sometimes be very small or even adverse to 
binding, as illustrated by the comparison of 
ligand pairs differing by just one hydrogen 
bond (65). Charge-assisted hydrogen bonds 
are stronger than neutral ones, but also asso¬ 
ciated with a higher desolvation penalty. 
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Thus, the electrostatic interaction of an ex¬ 
posed salt bridge contributes as much as a 
neutral hydrogen bond (5 ± 1 kJ/mol accord¬ 
ing to Ref. 66), but the same interaction in the 
interior of a protein can be significantly stron¬ 
ger (67). Because of the complicated interplay 
with water, a detailed analysis of the thermo¬ 
dynamics of hydrogen bond formation can 
sometimes yield surprising results. For a par¬ 
ticular hydrogen bond in complexes of-the 
FK506-binding protein, it has been found that 
its formation is enthalpically unfavorable but 
entropically favorable (60). The entropy gain 
appears to be attributable mainly to the re¬ 
placement of two water molecules (68). 

Contributions from hydrophobic interac¬ 
tions have frequently been found to be propor¬ 
tional to the lipophilic surface area buried 
from solvent, with values in the range of 80- 
200 J/(mol A 2 ) (69-71). The entropic penalty 
for freezing a single rotatable bond has been 
estimated to be 1.6-3.6 kJ/mol at 300 K (72, 
73); recent estimates derived from NMR shift 
titrations are much lower (0.5 kJ/mol) (74), 
but in the systems studied the conformational 
restriction may not have been as high as in a 
protein binding site. Finally, the unfavorable 
entropy contribution from the loss of transla¬ 
tional and orientational degrees of freedom 
has been estimated to be around 10 kJ/mol 
(75, 76). 

Despite many inconsistencies and difficul¬ 
ties in interpretation, most of the experimen¬ 
tal data suggest that simple additive models of 
protein-ligand interactions might be a reason¬ 
able starting point for the development of 
methods to predict binding affinities, that is, 
for the derivation of empirical scoring func¬ 
tions. Still, it has to be kept in mind that the 
assumption of additivity in biochemical phe¬ 
nomena is not strictly valid (77). On the other 
hand, the large body of experimental data on 
3D structures of protein-ligand complexes 
and binding affinities allows one to derive 
some general characteristics about protein-li¬ 
gand interactions. Several features are com¬ 
monly found in complexes of tightly binding 
ligands: 

1. A high steric complementarity between the 

protein and the ligand, an observation often 

described as the lock-and-key paradigm (78, 
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79). This complementarity, however, is fre¬ 
quently not the result of a match between 
rigid bodies, but rather achieved through sig¬ 
nificant conformational changes of both 
binding partners, a phenomenon generally 
referred to as induced fit. Additionally, elec¬ 
trostatic complementarity can also be in¬ 
duced, for example,by strong p K & shifts upon 
ligand binding that result in the release or 
uptake of protons of different functional 
groups either of the protein or the ligand. 

2. A high complementarity of the surface 
properties. Lipophilic parts of the ligands 
are generally in contact with lipophilic 
parts of the protein, whereas polar groups 
are usually paired with suitable polar pro¬ 
tein groups to form hydrogen bonds or 
ionic interactions. 

3. An energeticallyfavorable conformation of 
the bound ligand. Significant conforma¬ 
tional strain is usually not observed in li¬ 
gands binding with high affinity. 

In addition to insights taken from high-affin¬ 
ity complexes, experimental information about 
weakly bound complexes could be equally in¬ 
structive. Such information has indeed been 
recognized to be vital for the development of 
scoring functions (80). Structural data on unfa¬ 
vorable protein-ligand interactions, however, 
are sparse, partly because structures of weakly 
binding ligands are more difficult to obtain and 
are usually considered less interesting by many 
structural biologists. What can be concluded 
firm the available data is that an imperfect 
steric fit at the lipophilic part of the protein-li¬ 
gand interface leads to reduced binding affinity 
and that unpaired buried polar groups at the 
protein-ligand interface are strongly adverse to 
binding. Few buried CO and NH groups in 
folded proteins fail to form hydrogen bonds (81). 
Therefore, in the ligand design process an im¬ 
portant prerequisite to be regarded is that polar 
functional groups, either of the protein or the 
ligand, will find suitable counterparts if they be- 
<ure buried on ligand binding. 

.2 Docking, Scoring, and Virtual Screening: 
e Basic Concepts 

he subject of docking is the formation of non- 
valent protein-ligand complexes. Given the 


289 

structures of a ligand and a protein, the task is 
to predict the structure of the resulting com¬ 
plex. This is the so-called docking problem. Be¬ 
cause the native geometry of the complex can 
generally be assumed to reflect the global min¬ 
imum of the binding free energy, docking is 
actually an energy-optimization problem (82), 
concerned with the search of the lowest free 
energy binding mode of a ligand within a pro¬ 
tein binding site. The macromolecular nature 
of the protein and the fact that binding occurs 
in aqueous solution complicate the problem 
significantly because of the high dimensional¬ 
ity of the configuration space and considerable 
complexity of the energetics governing the in¬ 
teraction. Accordingly, heuristic approxima¬ 
tions are frequently required to render the 
problem tractable within a reasonable time 
frame. The development of docking methods is 
therefore also concerned with making the 
right assumptions and finding acceptable sim¬ 
plifications that still provide a sufficiently ac¬ 
curate and predictive model for protein-ligand 
interactions. 

Regardless of the nature of the interacting 
partners, computational docking always re¬ 
quires two components, which may briefly be 
characterized as "searching" and "scoring" 
(83). "Searching" refers to the fact that any 
docking method has to explore the configura¬ 
tion space accessible for the interaction Ibe- 
tween the two molecules. The goal of this ex¬ 
ploration is to find the orientation and 
conformation of the interacting molecules cor¬ 
responding to the global minimum of the free 
energy of binding. Unless the degrees of free¬ 
dom are restricted to translation and rotation 
by treating both molecules as rigid bodies, a 
full systematic search of all “dockings” is nor¬ 
mally not feasible because of the huge number 
of potential solutions and the large amount of 
computational resources needed to evaluate 
them. Different strategies are therefore re¬ 
quired, which should be accurate and efficient: 
accurate in the sense that the optimization 
procedure should not miss any valuable solu¬ 
tion (near-global minima), and efficient in 
terms of computing time and with respect to 
the fact that the algorithm should not spend 
unnecessary time by exploring irrelevant re¬ 
gions or by rediscovering previously detected 
local minima. As will be elaborated in the next 
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section, there are two opposing approaches to 
simplify the docking problem either by refor¬ 
mulating it to a discrete problem that can be 
solved with combinatorial algorithms or by us¬ 
ing stochastic search algorithms. 

"Scoring" refers to the fact that any dock¬ 
ing procedure must evaluate and rank the con¬ 
figurations generated by the search process. 
The scoring scheme most closely related to ex¬ 
periment, the ab initio calculation of the free 
energy of binding, is not easily accessible to 
computation. Hence, approximate scoring 
functions must be used that model the binding 
free energy with sufficient accuracy and corre¬ 
late well with experimental binding affinities. 
In particular, the scoring function should be 
able to discriminate between native and non¬ 
native binding modes. 

Scoring is actually composed of three dif¬ 
ferent aspects relevant to docking and design: 

1. Ranking of the configurations generated by 
the docking search for one ligand interact¬ 
ing with a given protein; this aspect is es¬ 
sential to detect the binding mode best ap¬ 
proximating the experimentally observed 
situation. 

2. Ranking different ligands with respect to 
the binding to one protein, that is, priori¬ 
tizing ligands according to their affinity; 
this aspect is essential in virtual screening. 

3. Ranking one or different ligands with re¬ 
spect to their binding affinity to different 
proteins; this aspect is essential for the con¬ 
sideration of selectivity and specificity. 

If one were able to accurately calculate the 
free energy of binding, all three aspects would 
be satisfied simultaneously. Current scoring 
functions used in docking programs, however, 
can usually resolve satisfactorily only the first 
aspect. They provide only a rough estimate 
with respect to the comparison across differ¬ 
ent ligand or protein systems. This is the case 
whenever the scoring scheme neglects certain 
factors that are virtually constant for different 
binding modes with respect to one protein, but 
that matter for comparisons with other pro¬ 
teins. 

Following the general paradigm shift in 
structure-based design from single com- 
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pounds to compound libraries, state-of-the-art 
docking and scoring methods have to be suffi¬ 
ciently fast to be applied for virtual screening. 
The general strategy of a virtual screening 
process based on the 3D structure of a target 
typically involves the following steps: 

1. Analysis of the 3D protein structure. 

2 . Selection of key interactions that need to 
be satisfied by all candidate molecules. 

3 . Computational search in chemical data¬ 
bases for compounds that potentially sat¬ 
isfy the key interactions, fit into the bind¬ 
ing site, and form additional interactions 
with the protein; this is done by means cf 
docking and/or structure-based pharma¬ 
cophore searches. 

4 . Postprocessing by analyzing the retrieved 
hits and removing undesirable compounds. 

5 . Synthesis or ordering of the selected 
compounds. 

6 . Biological testing, eventually crystallo¬ 
graphic confirmation. 

All these steps will be discussed in some 
more detail in section below. Of primary inter¬ 
est in the context of this chapter is step 3. It 
requires high-throughput docking with effi¬ 
cient search algorithms, and scoring functions 
that are able to provide a good separation be¬ 
tween potentially "binding" and "nonbind¬ 
ing" ligands. The database or library that is 
screened should consist of a sufficiently large 
and diverse set of relevant compounds. Thus, 
library design is increasingly applied to ensure 
that only reasonably preselected compounds 
are docked (29, 84, 85). 

3 DOCKING 

In this section, approaches to the docking 
problem are presented with respect to the 
docking algorithm and the search aspect, 
Scoring is discussed separately in Section 4. It 
should be noted in this context, that although 
a specific docking method is frequently associ¬ 
ated with a certain scoring procedure, many 
docking methods could in principle be com¬ 
bined with a variety of different scoring func¬ 
tions, either for postprocessing of the results 
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or as objective function during the optimiza¬ 
tion. Actually, such strategies are followed by 
considering multiple scoring schemes to 
achieve "consensus scoring" (86) or "multidi¬ 
mensional scoring" (87). The emphasis in this 
section is on general characteristics and prin¬ 
ciples, rather than individual methods, al¬ 
though occasionally specific docking programs 
have been selected as representative examples 
for a more detailed illustration of a general 
concept. The interested reader is referred to 
Table 7.1 for an overview of currently used 
docking programs described in the literature. 
In addition, a valuable source of information is 
the corpus of regularly published reviews in 
the field of docking (18,19, 26, 27,83,88-95). 

3.1 General Concepts to Address the 
Docking Problem 

Essential for any docking method is a search 
algorithm that samples the configuration 
space cf two interacting molecules. These mol¬ 
ecules need to be represented in a way that is 
suitable for efficient handling by the search 
algorithm. Docking methods may therefore 
roughly be classified by the way the macromo- 
lecular receptor is represented (Section 3.1.1), 
by the handling of the ligand (Section 3.1.2), 
and^nost important—by the search algo¬ 
rithm itself (Section 3.1.3). 


3.1.1 Representation of the Macromolecu- 
lar Receptor. The most straightforward ap¬ 
proach for representing the macromolecular 
structure in a docking application would be by 
atomic coordinates of the entire protein. Afull 
i atomic representation, however, is generally 
impractical because of the size and complexity 
| of protein structures. The structural informa¬ 
tion therefore needs to be reduced to a man- 
j ageable yet representative size and form. 

A first step into this direction is to limit the 
search area to the region surrounding the pu- 
tative binding site. This is general practice in 
I protein-ligand docking (whereas in protein- 
| protein docking often the entire surfaces are 
| searched for appropriate matches). Scanning 
of the entire surface for potential binding re- 
' gions cf a small-molecule ligand would hardly 
be feasible with most docking methods. Fur¬ 
thermore, it would be rather unreasonable to 


ignore information already available from bio¬ 
chemical experiments or structural data of re¬ 
lated complexes. If no such information is 
available [a situation that we may increasingly 
be facing as a consequence of the effects of 
the structural genomics initiatives (17, 96)], 
methods to identify binding sites are required 
before the actual docking process can start. 
Examples are programs for geometric cavity 
detection, such as LIGSITE (97) or PASS (98), 
tools to infer protein function from structural 
homologies (99, 100), or more sophisticated 
approaches based on a physicochemical and 
geometrical characterization of binding sites 
(101). Some docking programs incorporate 
routines for binding site identification as pre¬ 
processing steps (102). 

Despite a reduction to only a specified part 
of the protein surface, a simple representation 
in terms of atomic coordinates is not practical 
for most docking procedures. Instead, the 
space available for ligand binding is frequently 
characterized by other means that permit 
more efficient searches. A first alternative is 
given by geometric shape descriptors, some¬ 
times combined with a physicochemical de¬ 
scription. Approaches of this class include mo¬ 
lecular surface cubes (103), surface normals at 
sparse critical points (104), and modified Lee- 
Richard's dotted surfaces, with each dot coded 
by chemical property and accessibility (105). A 
further prominent example is the sphere 
images of the binding site used in DOCK (106, 
107). These spheres are complementary to the 
molecular surface and represent a space-fill¬ 
ing negative image of the binding site. An¬ 
other important concept that goes beyond a 
pure geometric description and represents in¬ 
teraction properties of physicochemical rele¬ 
vance is the usage of interaction sites or 
points, as introduced by the program LUDI 
(108,109). These interaction sites are discrete 
positions and vectors in space serving as 
dummy representations for atoms capable of 
forming hydrogen bonds or filling hydropho¬ 
bic pockets. The docking tool FlexXis based on 
this concept (110). Also, the program SLIDE 
(111) and the new approach by Diller and 
Merz (112) use interaction points for fast 
docking. 

A popular alternative to geometric or phys¬ 
icochemical descriptors is the grid representa- 









292 


Doddng and Scoring Functions/Virtual Screening 


Table 7.1 Overview of Currently Used Programs for Protein-Ligand Docking 






Selected References to 

Class of Docking 

Name of 

Year 

Original 

Further Developments 

Method a 

Program 

Published 

References 

and Applications 

Geometric/combinatorial 





Shape/descriptor 

DOCK 

1982 

(106) 

(11, 127, 143, 371) 

matching 

FLOG 

1994 

(121) 

(366) 


ADAM 

1994 

(135) 

(386) 


LIGIN 

1996 

(387) 

(234) 


SANDOCK 

1998 

(105) 

(13) 


QSDOCK 

2000 

(136) 



SLIDE 

2000 

(111) 



FRED 

2001 

(123) 



(Diller & 

2001 

(112) 



Merz) 




Incremental 

FlexX 

1996 

(110) 

(130, 138, 139, 216, 233) 

construction 

Hammerhead 

1996 

(388) 



DOCK4.0 

1998 

(328) 

(131) 

Systematic search 

EUDOC 

2001 

(125) 

(389) 

(transl. + rot.) 





Energy-driven/stochastic 





Monte Carlo simulated 

AutoDock 

1990 

(113) 

(95, 115, 390) 

annealing 

RESEARCH 

1992 

(146) 

(145) 


MCDOCK 

1999 

(147) 


Monte Carlo 

ICM 

1994 

(116) 

(82, 117, 201) 

minimization 

(Caflisch et al.) 

1997 

(150) 

(151) 


QXP 

1997 

(152) 



PRODOCK 

1998 

(119) 

(118) 

Molecular dynamics 

MDD 

1994 

(164) 

(165) 

(MD) 

(Luty et al.) 

1995 

(169) 



(Vieth et al.) 

1998 

(166) 



q-jumping MD 

2000 

(167) 

(168) 

Genetic algorithm 

GOLD 

1995 

(176) 

(177) 


AutoDock3.0 

1998 

(115) 

(208, 228, 391) 


GAMBLER 

1999 

(86) 



DARWIN 

2000 

(178) 


Tabu search 

PRO LEADS 

1998 

(188) 

(189, 360) 

Tabu search + genetic 

SFDock 

1999 

(392) 


algorithm 

Eigenvector following 

Low Mode 

1999 

(211) 



Search 




Mining Minima 

Mining 

2001 

(190) 


algorithm 

Minima 





"The classification provided in the first column can only be approximate for programs that offer a variety of different 
functionalities or follow multistep strategies. 


tion of protein structures. The general princi¬ 
ple of this approach is that the protein is 
represented by a set of affinity grids or maps 
that cover the entire search region. These reg¬ 


ularly spaced, orthogonal grids are calculated 
before the actual docking process. At every 
grid point, some sort of scoring value or inter¬ 
action energy of a probe atom with the entire 





3 Docking 


293 


protein is calculated, providing a map of pseu¬ 
do-affinities for each atom type or interaction 
type possibly present in the ligands to be 
docked. These maps then serve as look-up ta¬ 
bles for the calculation of the interaction en¬ 
ergy or scoring value during the docking pro¬ 
cess. Examples of docking programs using this 
approach are AutoDock (113-115), ICM (82, 
116,117), or ProDock (118,119). 

It should be noted that most of the men¬ 
tioned representations of protein structure 
imply that the protein remains rigid during 
the docking process. As a matter of fact, dock¬ 
ing under the assumption of a rigid protein is 
still common practice in standard applica¬ 
tions. Although an acceptable simplification 
under certain circumstances, it can represent 
a serious limitation if only unbound protein 
structures are available. As a consequence, the 
inclusion of protein flexibility in the docking 
process is an active area of research, and a 
separate section is dedicated to this issue (cf. 
Section 3.2.1). 


3.1.2 Ligand Handling. For the ligand, a 
complete representation in atomic coordi¬ 
nates is perfectly feasible. Ligand atoms may 
be used directly for matching with binding site 
descriptors or in the calculation of interaction 
energies in the case of energy-driven proce¬ 
dures. The central problem is conformational 
flexibility. Predicting the binding conforma¬ 
tion cf a ligand is in fact a major component of 
the docking problem, given that this confor¬ 
mation can significantly differ from that 
adopted in other environments. 

Twa general strategies for ligand handling 
m^ be distinguished: whole-molecule ap¬ 
proaches and fragment-based methods. In the 
first case, the ligand is docked as an entire 
molecule. This is rather straightforward if the 
ligand is treated as a rigid body and only trans¬ 
lational and rotational degrees of freedom are 
considered. Such rigid docking was common 
practice in early docking algorithms (106, 
120). A straightforward extension to account 
for flexibility is to separately dock precalcu¬ 
lated conformers of a given molecule (variant 
' 1 in Fig. 7.2). Explicit docking of multiple con- 
formers has, for example, been obtained with 
the FLOG program (121). FLOG deals with 
conformational flexibility by generating dif¬ 


ferent conformers using distance geometry 
and docking each conformer in a rigid-body 
fashion. A similar approach has also been ob¬ 
tained with the DOCK program (122). To 
avoid redundancy in the docking, a common 
rigid fragment is identified, which is docked 
only once for the entire set of pregenerated 
conformers. The flexible portions of the mole¬ 
cule that determine the different conforma¬ 
tions are subsequently scored based on the 
preplacement of the rigid fragment. Yet other 
examples for rigid docking of multiple con¬ 
formers are provided by the programs FRED 
from OpenEye Scientific Software, which per¬ 
forms a fast exhaustive search over all possible 
orientations (123), and SYSDOC (124) or EU- 
DOC (125), which use fast affine transforma¬ 
tion to perform systematic searches over the 
translational and rotational degrees of free¬ 
dom of the ligand. 

Although this multi-conformer docking can 
be efficient and accurate for molecules with a 
limited number of discrete, low-energy confor¬ 
mations, it is less suited for larger and highly 
flexible molecules, simply because the number 
of possible conformations increases dramati¬ 
cally. Another way of partially accounting for 
conformational flexibility in whole-molecule 
rigid-body docking is to subject the initial 
matches to some kind of optimization that al¬ 
lows for conformational relaxation. This coulcl 
be done with some standard energy minimiza¬ 
tion technique (126,127) or other procedures 
that resolve clashes of the initial placement by 
rotation about single bonds, as done, for exam¬ 
ple, in the docking program SLIDE (111). 

A more rigorous treatment of ligand flexi¬ 
bility in whole-molecule docking is performed 
by sampling ligand conformation space during 
docking (variant 2 in Fig. 7.2). It normally re¬ 
quires ligand conformational energies to be 
evaluated besides intermolecular interaction 
energy. Molecular mechanics force fields are 
frequently applied for this purpose. Although 
a more exhaustive sampling of accessible con¬ 
formations within the binding site is definitely 
achieved, an obvious disadvantage is the 
higher computational demand and possibly a 
reduced efficiency of the algorithm because of 
lengthy exploration of local minima. 

An interesting variant of whole-molecule 
representations is the use of internal coordi- 
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Alternative strategies fix flexible ligand docking 


1. Separate conformer generation 
and rigid-body docking 


2. Simultaneous optimization 
cf orientation and conformation 
(simulated annealing, GA) 


3. Placement (f anchor fragment 
followed by 

incremental construction 



Figure 7.2. Strategies for flexible ligand docking. 


nates instead of Cartesian coordinates (82). 
Internal coordinates help to reduce the num¬ 
ber of variables defining the conformation of 
the molecular system. In Cartesian space, 
three functionally equivalent variables per 
atom are required. Internal coordinates, in¬ 
stead, consist of bond lengths, bond angles, 
and torsion angles. Because bond lengths and 
angles can be considered rigid to a good ap¬ 
proximation, only the torsion angles matter as 
variables to map conformation space. An effi¬ 
cient implementation of docking algorithms 
operating on internal coordinates has been ob¬ 
tained, for example, with the ICM method (82, 
116,117). 

Fragment-based techniques are an alterna¬ 
tive to whole-molecule docking (variant 3 in 
Fig. 7.2). Here, the molecule is dissected into 
fragments that can be docked individually in a 
rigid fashion. The fragments can either be 
docked separately and then reconnected, or 
the ligand is built up incrementally following a 
certain fragmentation scheme. The first vari¬ 
ant is very common to programs dedicated to 
de novo design rather than pure docking. 


Methods of this class have been reviewed ex¬ 
tensively (26, 27). However, the approach has 
also been applied for docking (128) and com¬ 
pared to the whole-molecule docking approach 
(129). 

The other variant of fragment-based ligand 
docking is used in incremental construction 
algorithms (110,130), sometimes alsoreferi 
to as "anchor and grow" (131). These search 
strategies are further described below. They 
dissect the ligand into modular portions and 
rebuild it incrementally within the binding 
site starting from the docking position of a 
suitable base fragment. The advantage is that 
many potential combinations are eliminated 
early in the construction, but success critically 
depends on the selection and placement ofthe 
base fragment. 

3.1.3 Strategies for Searching the Configu¬ 
ration and Conformation Space. Search strat¬ 
egies of automated docking procedures may 
roughly be classified as geometric or combina¬ 
torial on the one hand and energy driven or 
stochastic on the other, although ultimately 
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all methods try to optimize a function that 
models to some extent the free energy of bind¬ 


ing. 

3.1.3.1 Geometric/Combinatorial Search 

Strategies. Most of the early docking methods 
were entirely based on the concept of shape 
complementarity. Until today this is the fun¬ 
damental idea in most protein-protein docking 
programs. The observation that protein-li¬ 
gand complexes frequently show a remarkable 
shape fit of both binding partners has stimu¬ 
lated the conception of surface or descriptor 
matching as docking search technique. The 
molecules are represented by geometric 
and/or physicochemical descriptors and vari¬ 
ous alignment procedures are applied to 
match complementary parts of ligand and 
protein. An example is the original DOCK 
method, where the ligand is superimposed 
onto a negative sphere image of the binding 
pocket, using a distance matching algorithm 
followed by least-squares fitting (106, 132). 
Other examples are the least-squares fitting 
procedure described by Bacon and Moult to 
achieve matches between complementary sur¬ 
face patterns (133), or the hierarchical search 
of geometrically compatible triplets of surface 
normals on the molecules to be docked, as pro¬ 
posed by Wallqvist and Covell (134). The pro- 
gran ADAM performs a complete combinato¬ 
rial search over all possible matches between 
hydrogen bond patterns (135) .Recently, a new 
matching algorithm based on so-called qua¬ 
dratic shape descriptors has been described 
(QSDock); along with the presentation of their 
method, the authors also provide an extensive 
discussion of shape-based docking algorithms 


( 136 ). 

Another recent example of descriptor 
matching is SLIDE, developed as a tool for li¬ 
gand database screening by docking (111). 
The binding site is represented by a template 
of favorable interaction points onto which li¬ 
gand atoms are matched during the search. 
Instead cf serving as a purely geometric de¬ 
scription, these points address four different 
types cf interactions (hydrogen-bond donor, 
acceptor, donor/acceptor, or hydrophobic in¬ 
teraction center). The search is then per- 
fomed such that all triangles of appropriate 
I atoms in the ligand are exhaustively mapped 
onto triangles of template points with compat¬ 


ible geometry and chemistry. This mapping is 
used to generate initial placements of mole¬ 
cules in the binding site and followed by a se¬ 
ries of steps that refine the initial position, 
resolve collisions, and consider flexibility of 
both the ligand and the protein side chains (cf. 
note on hybrid approaches below). Similarly, 
the rapid docking approach for library priori¬ 
tization developed by Diller and Merz (112) is 
based on rigid-body triplet matching of ligand 
atoms onto precalculated hot spots; subse¬ 
quently, pruning is performed to remove any 
positions with significant steric clash, and the 
remaining matches are subjected to energy 
minimization. 

Pure descriptor matching is efficient for 
rigid-body docking only. Flexible docking, in 
fact, is always faced with the additional prob¬ 
lem of a combinatorial explosion of possible 
conformers depending on the number of rotat¬ 
able bonds. Systematic searches or explicit 
consideration of each possible conformation 
would therefore require enormous computing 
resources. A popular way to address this prob¬ 
lem within the class of geometric/combinato¬ 
rial docking methods is incremental construc¬ 
tion (110, 130, 131, 137). The ligand is 
dissected into fragments and incrementally 
reconstructed in the binding site startingfrom 
a suitably docked base fragment. To avoid 
dead-end solutions during construction, mul¬ 
tiple placements of the base fragment have to 
be considered. In addition, it can be useful to 
perform different fragmentations and hence 
to use different base fragments as starting 
points, especially for long and highly flexible 
molecules. The docking itself, that is, the 
placement of the base fragment and the at¬ 
tachment of remaining portions, is guided by 
some descriptor matching procedure. 

An example of an incremental construction 
method is the program FlexX (110,130, 138, 
139). Conformational flexibility is considered 
using a discrete set of preferred torsion angles 
about acyclic single bonds, together with mul¬ 
tiple conformations for ring systems. These 
torsion angle preferences are taken from a li¬ 
brary compiled from torsional fragments 
extracted from the Cambridge Structural Da¬ 
tabase (140). The model of molecular interac¬ 
tions is based on similar rules as implemented 
in LUDI, originating from a composite crystal- 
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field analysis (141). For each group capable of 
forming an interaction, a special contact ge¬ 
ometry is defined: the group is placed to a cen¬ 
ter about which an interaction surface is de¬ 
fined, usually as part of a sphere. Two groups 
form an interaction if the interaction center of 
one group coincides with the interaction sur¬ 
face of a counter group. To start with the ac¬ 
tual docking process, the ligand is fragmented 
into components by dissecting at all single 
bonds that are not part of a cycle. Out of these 
components suitable base fragments are se¬ 
lected. The base fragment is the first portion 
of the ligand to be placed into the binding site. 
This is done by superimposing either triples 
or pairs of interaction centers constructed 
around the base fragment with triples or pairs 
of compatible interaction points generated in 
the binding region. Normally, a large number 
of initial placements is generated, which is 
then reduced either by clustering similar solu¬ 
tions or because of clashes with the protein. 
Next, the incremental construction of the en¬ 
tire ligand is initiated. Starting with the dif¬ 
ferent base placements, the ligand is built up 
by stepwise linking of the components in com¬ 
pliance with the torsional database. After 
hooking up additional fragments, new interac¬ 
tions are searched and a scoring function is 
used to select the best partial solutions, which 
are expanded in the following step. This is 
done until the last fragment has been added 
and placed to result in the complete ligand. 
The generated ligand positions are finally 
stored and ranked according to the predicted 
binding affinity. 

An anchor-and-grow algorithm has re¬ 
cently also been incorporated into DOCK 
(131). Here, after identification of rotatable 
bonds, the ligand is fragmented into rigid seg¬ 
ments, the largest segment is identified as the 
anchor, and the remaining segments are orga¬ 
nized as layers surrounding the anchor. Then 
the anchor is docked using geometrical match¬ 
ing. Based on the obtained anchor positions, 
the conformational search is initiated by add¬ 
ing segments from the innermost layer and 
proceeding outward. This addition is done ac¬ 
cording to the accessible torsion angle values 
along the newly added bond. The default is to 
use two alternative settings for bonds between 
two sp 2 hybridized atoms, three between two 
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sp 3 atoms and six between sp 2 and sp 3 atoms. 
The partial constructs are then locally opti¬ 
mized to minimize the sum of intra- and inter- 
molecular energies and pruned back to an ap¬ 
proximately constant size of configurations. 
Pruning is necessary to cope with combinato¬ 
rial explosion. It is performed on the basis cf 
the score and the orientation, such that both 
the best scoring and most deviating orienta¬ 
tions are retained from each expansion cycle. 
Finally, after complete reconstruction of the 
ligand, the pruned set of binding configura¬ 
tions is again subjected to local energy 
minimization. 

This anchor-and-grow implementation in 
DOCK represents a combination of a geomet¬ 
ric and energy-based approach to docking, due 
to the intermediate steps of energy minimiza¬ 
tion. As already encountered for SLIDE (111), 
such multistep or hybrid approaches are com¬ 
monly found in current docking protocols. 
DOCK in general is a prototype of such a pro¬ 
gram, originally based solely on rigid geomet¬ 
ric descriptor matching, later enhanced with a 
variety of additional features. For example, 
some degree of flexibility has been introduced 
into the rigid docking procedure by dissecting 
the ligand into a small set of rigid fragments 
that are docked separately and then recon¬ 
nected (128). The concept of geometric shape 
complementarity has been extended to con¬ 
sider physicochemical complementarity by as¬ 
signing properties to binding-site spheres and 
allowing them to match only those ligand 
atoms that are of complementary character, 
an approach referred to as "sphere coloring" 
(142,143). Rigid-body minimization has been 
introduced as refinement after the initial de¬ 
scriptor-matching step (126) or in the variant 
of on-the-fly optimization using force-field en¬ 
ergies precomputed on a grid (127). In sum¬ 
mary, the combination of different approaches 
and algorithms to overcome the limitations of 
every single approach has provided us with 
steadily improving solutions to the docking 
problem. 

3.1.3.2 Energy Driven/Stochastic Proce¬ 
dures. As mentioned above, docking is essen¬ 
tially an energy optimization problem because 
the native binding mode of a ligand can in gen¬ 
eral be expected to correspond to the global 
minimum of the binding free energy (82). Ac- 
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cordingly, finding this binding mode by dock¬ 
ing corresponds to the identification of the 
global minimum of the free-energy function. 
Because the actual free energy of binding is 
not accessible to computation, approximate 
energy evaluations or scoring functions are 
used to guide the search. These functions are 
required to model the free-energy surface in 
anappropriate way: although the absolute val¬ 
ues are not of relevance for the structural as¬ 
pect cf docking, it is essential that the global 
minimum of a relative free-energy function 
models accurately enough the position of the 
global minimum on the real free energy sur¬ 
face. (It is worth mentioning in this context 
that in purely geometrical or descriptor-based 
docking procedures, the central assumption is 
that the degree of surface complementarity or 
matching between descriptors is proportional 
to the interaction energy.) 

With a suitable energy function available, 
docking can be performed by global minimiza¬ 
tion cf the energy with respect to the position, 
orientation, and conformation of the ligand. 
However, this apparently straightforward ap¬ 
proach bears two fundamental problems, in¬ 
herently related to characteristics of the en- 
eigy landscape of protein-ligand interactions: 
the high dimensionality, which precludes a 
systematic, exhaustive search; and the rug¬ 
gedness of the surface, reflected by a large 
number of local minima. Because of this last 
aspect, standard energy minimization tech¬ 
niques alone are not useful for docking appli¬ 
cations because they can guide the search only 
to the next local minimum. They are used, 
however, in combination with other tech¬ 
niques and play a valuable role at certain 
stages cf the docking process, primarily to re¬ 
fine docked positions and conformations by ex- 
ploringthe local energy landscape in the vicin¬ 
ity of this position. 

To address the docking problem, tech¬ 
niques for a more global exploration of the en- 
agy landscape are required. A variety of 
methods is available, frequently used in the 
context of other modeling applications and op- 
f timization problems as well. Three major 
classes may be distinguished: Monte Carlo 
; techniques, molecular dynamics simulations, 
| and genetic algorithms. Many different vari¬ 
ants exist for all of them and frequently, in 


docking procedures, they are applied in com¬ 
bination with other techniques. 

Monte Carlo methods consist of two essen¬ 
tial components that are repetitively applied: 
a random walk of the ligand through the re¬ 
ceptor-near space (i.e., the random displace¬ 
ment along translational, rotational, and/or 
torsional degrees of freedom), and the evalua¬ 
tion of the new configuration based on the Me¬ 
tropolis criterion (144).This criterion decides 
whether a new position is accepted and hence 
on the configuration from where the search 
will proceed. If the energy of the new docked 
position CE new ) is more favorable (lower)than 
the energy of the previous position (E old ), the 
new position is accepted. If it is less favorable, 
the probability P for its acceptance is given by 

P = exp [ —(£ new - E o[d )/kT] 


where k is the Boltzmann constant and T is 
the effective temperature. To turn this sam¬ 
pling technique into an efficient optimization 
method applicable to docking, it has to be com¬ 
bined either with a temperature lowering pro¬ 
tocol or with some local minimization steps. 
The former approach is known as Monte Carlo 
simulated annealing, the latter as Monte 
Carlo minimization. 

In simulated annealing, the effective tem¬ 
perature T is initially set to a high value and 
gradually lowered, after a predefined number 
of Monte Carlo steps has been performed at a 
given temperature. At high temperatures, a 
broad region of configuration space is sam¬ 
pled: energy barriers can be surmounted be¬ 
cause of the high acceptance probability for 
less favorable placements. As the temperature 
is lowered, this becomes less probable and the 
configuration is optimized more locally. Given 
the stochastic nature of the process, multiple 
independent runs are required to assess con¬ 
vergence (this equally applies to many of the 
methods further described below). Examples 
of docking programs using Monte Carlo simu¬ 
lated annealing as a search strategy are 
AutoDock (113-115), RESEARCH (145,1461, 
and MCDOCK (147). 

In Monte Carlo minimization, an addi¬ 
tional step is inserted after the random walk 
before Metropolis evaluation. This step is a 
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local energy minimization, using techniques 
such as steepest descent or conjugate gradi¬ 
ent. Full local minimization after each ran¬ 
dom-walk step has been reported to improve 
the efficiency of the procedure (148, 149). A 
docking procedure that uses global Monte 
Carlo minimization is the ICM program of 
Totrov and Abagyan (82, 116, 117). ICM de¬ 
scribes both the relative positions of two mol¬ 
ecules and their conformations by a uniform 
set of internal variables and uses precalcu¬ 
lated grids of the interaction energies to speed 
up calculations. Trosset and Scheraga use 
Monte Carlo minimization in their ProDock 
program; computational efficiency is en¬ 
hanced by a grid-based energy evaluation 
using Bezier splines, which enables one to 
evaluate gradients and hence to perform min¬ 
imization on a 3D grid (118, 119). Further 
Monte Carlo minimization docking proce¬ 
dures have been reported by Caflisch et al. 
(150,151). Also, the QXP program of McMar- 
tin and Bohacek relies on Monte Carlo tech¬ 
niques combined with energy-minimization 
procedures (152). 

Molecular dynamics (MD) simulations rep¬ 
resent another technique to sample configura¬ 
tion space (31, 153-157). Based on Newton’s 
equation of motion and principles of statistical 
thermodynamics, the standard application of 
this technique is to analyze flexibility and dy¬ 
namic properties of molecular systems and to 
calculate free energies in a theoretically rigor¬ 
ous manner (158-163). With respect to pro¬ 
tein-ligand docking, MD simulations could in 
principle be used to simulate the actual bind¬ 
ing process, thus providing a "realistic" view 
of how the docking process proceeds, although 
this is computationally still out of reach. In 
fact, standard MD requires massive computa¬ 
tional resources, which limits its application 
to a small number of selected systems. In the 
context of docking, the problem is that stan¬ 
dard MD is slow in exploring global features 
(crossing of large barriers and exploration of 
multiple binding sites); accordingly, MD is es¬ 
sentially limited to the simulation and refine¬ 
ment of already bound complexes. Di Nola et 
al. have addressed this problem in their MDD 
(MD docking) algorithm (164, 165). This 
method separates the ligand’s center of mass 
motion from its internal motions. A separate 
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coupling to different thermal baths for both 
types of motion of the ligand and the receptor 
is performed. Because the temperature and 
the time constants of couplingto the baths can 
be varied arbitrarily, it is possible to increase 
the kinetic energy of the center of mass of the 
ligand without increasing the temperature cf 
the internal motions of receptor and ligand. 
This allows for complete control of the search 
rate. The technique was applied to the docking 
of phosphocholine to antibody McPC603, 
starting from distinct positions well separated 
from the actual binding site. After appropriate 
sampling, the average structure of the com¬ 
plex in the binding region was found to closely 
resemble the crystal structure. Still, the 
method remains computationally expensive, 
and thus it is not yet suited for a large-scale 
application to practical drug design docking 
problems. 

Other docking applications of MD have 
been reported as well. In a comparison of a 
CHARJMM-based MD docking algorithm with 
a Monte Carlo and a genetic algorithm, Vieth 
et al. have observed a comparatively good per¬ 
formance of the MD search for the five ana¬ 
lyzed test cases (166). Pak et al. have recently 
presented a docking approach based on so- 
called q-jumpingMD (167,168); its basic idea 
is to apply a smoothed generalized effective 
potential to enhance conformational sampling 
by MD. Luty et al. have combined a grid rep¬ 
resentation for the bulk portion of the recep¬ 
tor with MD simulations of the ligand in the 
flexible binding site (169). Multiple-copy si¬ 
multaneous search methods (MCSS) can help 
to speed up energy-based searches. They use 
numerous ligand copies that are transparent 
to each other, but subject to the full force cf 
the protein (170,171). Finally, short MD sim¬ 
ulations are occasionally used at some stage cf 
a docking procedure, primarily with the pur¬ 
pose of local refinement, as for example in the 
multistep docking strategy of Wang et al., 
where the last step is an MD-based simulated 
annealing (129). 

The third major class of search methods are 
genetic algorithms (GAs), which are widely 
used for docking purposes. GAs are stochastic 
optimization methods inspired by the con¬ 
cepts of evolution (172-174). The optimization 
problem is generally formulated in the lan- 
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guage of genetics. Initially, a random popula¬ 
tion is generated in which each member corre¬ 
sponds to a potential solution of the problem. 
A member of the population is represented by 
its chromosome, in which the variables to be 
optimized are encoded. This means that each 
chromosome contains a number of genes, 
where the genes correspond to the value of a 
certain variable or set of variables. In the case 
of docking, the variables for translation and 
rotation, as well as the torsion angles of the 
ligand, are encoded in the chromosome. Ge¬ 
netic operators are then applied to the initial 
population to generate a new population. In 
general, these operators are "crossover," by 
which genes from two distinct chromosomes 
are interchanged to generate two new individ¬ 
uals, and "mutation," by which a given gene is 
randomly modified. For each newly generated 
individual the chromosome is decoded (geno¬ 
type +phenotype) and the fitness of the indi¬ 
vidual is evaluated. In the context of docking, 
this fitness is the interaction energy or dock¬ 
ing score. Individuals with better scores re¬ 
ceive a higher chance for being selected as 
| members of the new population, and thus a 
higher chance of survival and reproduction 
into the next generation. Accordingly, the av- 
| ,eragefitnessincreasesfrom generation to gen¬ 
eration, until, at some point, the process is 
| terminated (by reaching either a fixed number 
of generationsor a constant fitness of the pop¬ 
ulation). The best individual of this final pop¬ 
ulation represents the solution. 

Many different variants and implementa¬ 
tions cf GAs for docking exist, but the general 
| features are always similar. The application of 
GAs in drug design and docking has been re¬ 
viewed by Clark et al. (175). A prominent ex- 
| ample cf a docking program based on a GA is 
GOLD (176, 177). A special characteristic of 
GOLD is the direct encoding of hydrogen 
bonding motifs in the chromosome represen¬ 
tation. Upon chromosome decoding, a least- 
Isquares fit is used to optimize the overlap of 
[complementary pairs of hydrogen-bonding 
psites present in the ligand and the receptor. 
The newest version of AutoDock contains an 
Interesting variant of a GA, a so-called 
ckian GA (115).This is the combination 
Sofa traditional GA with a local search method 
-to perform energy minimization. At each gen¬ 



eration, a user-defined fraction of the popula¬ 
tion is subjected to such a local minimization. 
This hybrid algorithm was found to be more 
efficient than a traditional GA, also imple¬ 
mented in AutoDock. A conceptually similar 
strategy has recently been implemented into 
the docking program DARWIN (178). Here, a 
standard GA is combined with a gradient min¬ 
imization search strategy through an inter¬ 
face to the CHARMM molecular mechanics 
program (179). Further GA-based docking 
methods can be found in the literature (86, 
180-183). 

Another class of evolutionary algorithms 
that has occasionally found application in the 
context of docking is known as evolutionary 
programming (184). Its main difference with 
respect to GAs is that there is no recombina¬ 
tion (crossover) operator, such that evolution 
is wholly dependent on mutation. Gehlhaar et 
al. (185) and Westhead et al. (186) have dem¬ 
onstrated the applicability of evolutionary 
programming to the docking problem, although 
in a comparative study other algorithms were 
found to be more effective (186). A new variant 
called "family competition evolutionary algo¬ 
rithm" has recently been proposed for docking 
(187). 

Besides the three major classes of energy- 
driven searches (MD, MC, GA), some further 
heuristic algorithms and search strategies 
have been developed or adapted for the dock¬ 
ing problem. "Tabu search" was found to per¬ 
form well in comparison with other algo¬ 
rithms (186) and has thus become the main 
search strategy of the PRO-LEADS docking 
program (188,189). Briefly, the tabu search 
operates on randomly generated positions 
that are examined on the basis of a tabu list. 
This list contains a number of previously gen¬ 
erated solutions and serves to impose restric¬ 
tions on the search process: a random move of 
the ligand is considered "tabu" if it generates a 
solution that is not sufficiently different from 
the stored solutions, unless its energy is more 
favorable than the energy of the best solution 
so far. Using these restrictions, the search is 
prevented from revisiting regions of the 
search space and the exploration of new areas 
is encouraged. Ideas from tabu search are also 
used in the recently described adaptation of 
the Mining Minima algorithm for protein- 
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ligand docking (190).Here, an exclusion zone 
is placed around each energy minimum as it is 
discovered, to avoid rediscovering it in future 
docking iterations. Mining Minima itself is 
based on a variety of optimization techniques 
to gradually focus a large region of random 
search to areas around the lowest energy min¬ 
ima. 

3.2 Special Aspects of Docking 

Besides the general characteristics outlined 
above, there are a number of special issues 
associated with the docking methodology that 
deserve explicit consideration: protein flexibil¬ 
ity, water molecules, and objective assess¬ 
ment. In addition, the interplay of docking 
with QSAR methods and homology modeling 
is of further interest to highlight the possibil¬ 
ities opened by combined application of stan¬ 
dard methods in structure-based drug design. 

3.2.1 Protein Flexibility. Proteins are in¬ 
herently dynamic systems (153,191). A single, 
fixed conformation, even the average provided 
by a crystal structure, may not be an adequate 
representation of the protein, unless the sys¬ 
tem is very rigid (192). Instead, even under 
standard equilibrium conditions, the native 
folded state of a protein is best characterized 
by a collection or ensemble of energetically 
nearly equivalent conformations. If the condi¬ 
tions are changed, the local minima and the 
population of these states may shift, eventu¬ 
ally resulting in an observable change of the 
average structure. Also, the introduction of a 
ligand corresponds to a change of the environ¬ 
ment that may lead to similar effects. Accord¬ 
ingly, the binding conformation of the recep¬ 
tor may already be present in the ensemble of 
protein conformations (193, 194) and the li¬ 
gand does not actively deform a fixed state of 
the protein, as generally inferred from the "in¬ 
duced fit" model. 

Whatever the actual mechanism might be, 
the comparison cf experimental protein struc¬ 
tures in the ligand-free and in the complexed 
state frequently shows protein conforma¬ 
tional changes induced by or associated with 
ligand binding (195). The spectrum of phe¬ 
nomena ranges from side-chain rotations to 
loop rearrangements and the movement of en¬ 
tire domains. Accordingly, in the context of 
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docking it is frequently not justified to neglect 
protein flexibility (35). If no alternative for 
docking into the rigid protein is available, at 
least a protein conformation (possibly from a 
complex structure) should be used that is com¬ 
patible with suitable binding modes. Obvi¬ 
ously, a preferable docking tool would con¬ 
sider full protein flexibility, but appropriate 
realization of this goal remains a challenge be¬ 
cause of the high dimensionality of protein 
conformation space. Consideration of protein 
flexibility also complicates the problem of 
scoring and selecting the best ligand place¬ 
ment, given the difficulty in accurately evalu¬ 
ating protein conformational free energies in 
addition to ligand-binding free energies. 

Current approaches to the problem of flex¬ 
ible protein docking have recently been re¬ 
viewed by Carlson and McCammon (196), and 
more briefly by Abagyan and Totrov (18) and 
Claussen et al. (197). The methods differ by 
the degree of flexibility they can cover. The 
least complex methods are those that model 
small adjustments of contact residues and side 
chains in an implicit way using soft docking. 
The protein itself remains fixed, but either 
through an adapted geometric representation 
or using a tolerant scoring function a certain 
amount of overlap between the protein and 
the ligand is allowed, emulating some "plastic¬ 
ity" of the receptor. The docking program by 
Jiang and Kim based on the matching of mo¬ 
lecular surface cubes is explicitly based on this 
soft docking idea (103). Other more recent 
docking approaches have implemented a soft 
scoring function (198).The advantage of these 
simple approaches is that they do not increase 
the demands on computing time. 

The next level is represented by methods 
that allow for explicit side-chain flexibility. 
GOLD’S genetic algorithm can handle the ro¬ 
tation of a few terminal hydrogen-bond donor 
and acceptor groups to optimize the hydrogen- 
bonding network (176, 177). A technique to 
handle larger side-chain movements is the use 
of side-chain rotamer libraries, as first demon¬ 
strated by Leach. In this approach, heuristic 
algorithms such as dead-end elimination are 
used to search the large combinatorial space 
(199). Schaffer and Verkhivker instead use a 
rotamer library to first generate likely side- 
chain conformations, which are then sub- 
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jected to energy minimization together with 
the docked ligand (200). Another approach 
making use of minimization has been de¬ 
scribed by Apostolakis et al.: after "seeding" 
the receptor with randomly generated ligand 
positions that may overlap with the protein, 
the complex is subjected to minimization, dur¬ 
ing which nonbonded interactions are gradu¬ 
ally switched on, to gently relieve steric over¬ 
lap by minor conformational changes of the 
ligand and receptor. The best-ranked solu¬ 
tions are then subjected to further refinement 
by Monte Carlo minimization (151). Further¬ 
more, the Monte Carlo minimization tech¬ 
nique in internal coordinates of the ICM pro¬ 
gram can sample and optimize side-chain 
torsions during ligand docking (117, 201). Fi¬ 
nally, the docking tool SLIDE allows for some 
side-chain flexibility at the optimization stage 
of initial placements. In SLIDE, collisions are 
resolved by rotations about single bonds in the 
ligand and the protein side-chains to reduce a 
maximal number of collisions by minimal con¬ 
formational changes of both binding partners 
( 111 , 202 ). 

An alternative to account, in principle, for 
an arbitrary degree of protein flexibility is the 
use cf protein structure ensembles. The en¬ 
sembles could be assembled from multiple 
crystal structures of a given protein, from 
NMR structure determination, or from trajec¬ 
tories of molecular dynamics simulations. In 
addition, a rotamer library can be used to cre¬ 
ate a minimal set of new conformations (203). 
Whatever the origin of the individual mem¬ 
bers of the ensemble, each represents a dis¬ 
tinct conformational state of the protein, and 
may eventually correspond to the preferred li¬ 
gand-binding state. Three different ways to 
use protein ensembles for docking can be dis¬ 
tinguished: in its most straightforward form, 
docking is carried out sequentially with each 
member of the ensemble using rigid-receptor 
docking (124,204-206). Another way is to use 
a weighted-average representation of the en¬ 
semble. Knegtel et al. followed this approach 
by generating composite grids that were used 
fcr scoring within the DOCK program (207). 
Recently, it has also been tested with 
AutoDock (208). Broughton has developed an¬ 
other method by combining statistical analy¬ 
sis of a conformational ensemble from short 


MD simulations with grid-based docking pro¬ 
tocols (209).The third and most sophisticated 
approach to handle protein ensembles is im¬ 
plemented into FlexE, a variant of the FlexX 
program (197).FlexE is based on a united pro¬ 
tein description generated from the superim¬ 
posed structures of the ensemble. For the 
parts that differ among the protein structures, 
discrete alternative conformations are explic¬ 
itly taken into account on the fly during the 
incremental construction of the ligand in the 
binding site. As an important feature, these 
geometric alternatives are optimally joined to 
create new valid protein structures in a com¬ 
binatorial fashion. Thus, conformations of the 
protein are not limited to those explicitly 
present in the ensemble, nor are the interac¬ 
tions blurred by averaging over distinct alter¬ 
native instances, which may correspond to un¬ 
realistic protein conformations. 

The so-called Low Mode Search (LMOD), 
originally established as a method for confor¬ 
mational analysis (210), has recently been 
demonstrated to be applicable also to the prob¬ 
lem of docking flexible ligands into flexible 
protein binding sites (211). To explore the po¬ 
tential energy surface of molecules, LMOD is 
based on eigenvector following, where eigen¬ 
vectors correspond to the (low-frequency) 
"normal modes" of vibration. For the purpose 
of docking, LMOD has been combined with a 
limited torsional Monte Carlo movement, as 
well as random translation and rotation of the 
ligand. 

Generally, however, full consideration of 
flexibility, either of the binding site or the 
entire protein, remains the domain of MD sim¬ 
ulations. The disadvantage is their high com¬ 
putational demand required to achieve signif¬ 
icant sampling. Simplified MD restricted to 
the binding site has been used by Luty et al., 
where the bulk of the protein receptor is rep¬ 
resented as a grid, whereas a full atomic de¬ 
scription is used only for the proximity of the 
binding site to include flexibility in the dock¬ 
ing process (169).The approach of Mangoni et 
al. mentioned above provides a method for en¬ 
hanced sampling. It has been used to dock a 
ligand into a receptor that is treated fully flex¬ 
ible and solvated with explicit water molecules 
(165). Alternatively, shorter MD runs may be 
used at intermediate or final stages of a dock- 
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ing procedure to refine complexes generated 
by rigid-body docking methods. In this case, 
however, flexibility is not considered simulta¬ 
neously to the docking process. It thus only 
refines solutions from rigid receptor docking 
and does not enhance the scope of the search 
for possible binding modes. 

3.2.2 Water Molecules. Water plays a cru¬ 
cial role in molecular interactions (212, 213). 
At the interface of a protein-ligand complex, 
water molecules can have a significant impact 
on complex formation, either by mediating or 
improving specificity and affinity of the inter¬ 
action. They promote adaptability, thus allow¬ 
ing for promiscuous binding (214). Individual 
conserved ("structural") water molecules can 
therefore be crucial for the successful design 
of new inhibitors. A prominent example is the 
structural water molecule observed in nearly 
all HIV protease complexes with substrate¬ 
like inhibitors. Attempts at replacing it have 
guided the design of new tight-binding inhibi¬ 
tors [e.g., (215)]. Instead of the usual implicit 
modeling of solvation effects, explicit consid¬ 
eration of structural water molecules and wa¬ 
ter-mediated interactions would therefore be 
a highly desirable feature in docking methods. 
Ideally, simultaneously to the ligand place¬ 
ment the docking program should be able to 
predict whether at a particular site water mol¬ 
ecules mediating protein-ligand interactions 
may preferably reside or whether the displace¬ 
ment of these water molecules by appropriate 
ligand functional groups would be more favor¬ 
able. No docking tool is yet available to accom¬ 
plish this task. Obviously, not only the place¬ 
ment of water molecules is demanding, but 
especially their energy scoring, resulting from 
the complicated thermodynamics associated 
with water interactions. 

In principle, MD simulations provide the 
most natural route to the explicit consider¬ 
ation of water molecules. In the MD docking 
approach described by Mangoni et al., explicit 
water molecules are indeed used (165). It was 
found, though, that the presence of explicit 
water molecules shields the interactions be¬ 
tween the ligand and the receptor. Conse¬ 
quently, different weights were applied to the 
ligand-receptor and ligand-solvent interac¬ 
tions, respectively, to cope with this complica- 
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tion. Because of the high computational costs, 
the approach seems affordable only in special 
cases where the presence of explicit solvent 
appears important. 

An approach to explicitly place water mol¬ 
ecules during fast docking has been intro¬ 
duced into FlexX (216). In a preprocessing 
phase, possible favorable water sites in the 
binding pocket are calculated and stored. Dur¬ 
ing the incremental construction phase of 
FlexX, water molecules are switched on at 
these sites if they provide additional hydrogen 
bonds to the ligand. Steric constraints pro¬ 
duced by these water molecules and the qual¬ 
ity of the achieved hydrogen bond geometry 
are then used to optimize the ligand orienta¬ 
tion during the construction process. In sev¬ 
eral cases, water molecules between protein 
and ligand could be correctly predicted; how¬ 
ever, the overall improvement on the FlexX 
docking results for a test set of 200 complexes 
was nearly negligible. 

The program SLIDE can consider tightly 
bound waters while docking potential ligands 
(111). To select which water molecules to re¬ 
tain and which to remove from the binding 
pocket before docking, the knowledge-based 
approach Consolv (217) is applied to deter¬ 
mine those waters that are likely to be con¬ 
served upon ligand binding and to adjust a, 
penalty for their displacement. Once these wa¬ 
ters have been selected to be initially retained 
upon docking, SLIDE either translates or dis¬ 
cards a water molecule to remove overlap with 
ligand atoms after the ligand has been docked 
to the binding site. Displacement of a water 
molecule is performed only if collisions cannot 
be resolved by iterative translations. Any dis¬ 
placements by nonpolar ligand atoms are pe¬ 
nalized upon scoring. In database screening 
runs on three different target proteins, this 
procedure was found to produce reasonable re¬ 
sults with respect to water-mediated interac¬ 
tions, but no systematic test has been reported 
so far. 

As long as a simultaneous docking of water 
molecules and ligands is an unsolved problem, 
it remains common practice to consider essen¬ 
tial water molecules as a fixed part of the bind¬ 
ing site. Preplaced water molecules may either 
correspond to recurrently observed waters 
found in multiple crystal structures of the tar- 
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get protein, or to predicted positions based on 
estimated water affinity potentials suggested 
by programs such as GRID (218-220). The lat¬ 
ter strategy has been applied by Minke et al. 
using AutoDock (221), showing that success¬ 
ful docking of carbohydrate derivatives to the 
heat-labile enterotoxin critically depends on 
the inclusion of water molecules. Examples for 
the consideration of experimentally observed 
water molecules as part of the target during 
docking are the studies of Rao et al. (docking 
to factor Xa using AutoDock) (222) and Pospi- 
sil (docking to thymidine kinase using 
AutoDock and FlexX) (223). The influence of 
explicit water molecules in docking was also 
investigated in the validation study of the new 
program DARWIN (178). Inclusion of explicit 
water molecules was essential in some cases, 
unless interaction energies were calculated 
with a Poisson-Boltzmann-basedimplicit sol¬ 
vent model. Yet another example is a search 
for metallo-j3-lactamase inhibitors (14) with 
the docking program FLOG. Docking was per¬ 
formed with three different configurations of 
bound water in the active site. The top-scoring 
compounds showed an enrichment in biphenyl 
tetrazoles. A crystal structure of one tetrazole 
not only confirmed the predicted binding 
mode but also displayed the water configura¬ 
tion that had, retrospectively, been the most 
predictive one of the three models. Further 
examples from virtual screening studies are 
available that show that the inclusion of con¬ 
served water molecules in the docking process 
can dramatically improve the hit rate (15,161. 

3.2.3 Assessment of Docking Methods. 

Docking methods are usually assessed by their 
ability to reproduce the binding mode of ex¬ 
perimentally resolved protein-ligand com¬ 
plexes: the ligand is removed from the com¬ 
plex, a search area is defined around the actual 
binding site, the ligand is redocked into the 
protein, and the achieved binding mode is 
compared with the experimental position, 
usually in terms of a root-mean-square devia¬ 
tion (rmsd).If the rmsd is below 2 A, it is gen¬ 
erally considered a successful prediction. The 

vious goal is that such a "near-native" solu¬ 
tion is ranked best among the set of ligand 
poses generated. Virtually any introduction of 
a new docking method has been accompanied 


by such a test. The number of complexes used 
has varied as much as the reported success 
rates, which are between 10% (224)and 100% 
(152). Clearly, success rates of 100% are 
rather a consequence of the limited test set 
size than a reflection of the mere quality of the 
docking method. 

Numerous critical issues have to be ad¬ 
dressed in this context. Validations carried out 
on very few complexes (^20) do not ade¬ 
quately assess the scope of the method, partic¬ 
ularly if no attempt was made to select a 
representative set of structures that appropri¬ 
ately covers a broad range of binding features 
important to protein-ligand complexes. Up to 
now, only a few docking methods have been 
assessed on a broad range of complexes [e.g., 
FlexX (200 complexes) (139), ScoreDock and 
DOCK (200 complexes) (225), EUDOC (154 
complexes) (125), DOCK, FlexX, and Drug- 
Score (100-150 complexes) (226), GOLD (100 
complexes) (177), the method of Diller and 
Merz (using the GOLD test set) (112), and 
PRO-LEADS (70 complexes) (189)]. In the 
case of GOLD it has been explicitly mentioned 
that the test set was selected by a researcher 
not involved in the development of the algo¬ 
rithm (177). The definition of an objective and 
relevant reference test set that could serve as 
standard benchmark for every new docking 
method would be highly desirable for both 
user and developer (18).First efforts in devel¬ 
oping a database that could be of use in this 
context have been reported (227). Suitable 
test sets should cover a sufficient number of 
highly diverse protein-ligand complexes, in¬ 
cluding cases that provide some challenge to 
docking methods (e.g., water-mediated inter¬ 
actions, interactions with metal ions). To test 
performance with respect to potential induced 
fit, the structure of the unligated protein or 
alternative complexes with different bound li¬ 
gands should be available as well. The test set 
should comprise fully resolved crystal struc¬ 
tures with a resolution of ^2.5 A. Complexes 
with ligands significantly involved in crystal 
packing contacts should be avoided. Such 
cases will likely fail in reproducing the exper¬ 
imental binding mode because of missing con¬ 
tacts present only in the packing environment 
(228). Finally, the importance to study low- 
affinity or "non-binding" ligands must be ad- 
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dressed; accordingly, experimental informa¬ 
tion about the binding geometry and affinity 
of some weak-binding ligands should also be 
available. 

In addition to the tests usually reported by 
the authors of a program, comparative studies 
have been reported on the assessment of dif¬ 
ferent docking and scoring approaches. In part 
they also address some of the aspects raised 
above. Westhead et al. have presented a com¬ 
parison of four heuristic search algorithms 
(simulated annealing, genetic algorithm, evo¬ 
lutionary programming, and tabu search) 
(186). In an attempt to provide an unbiased 
comparison, all algorithms were implemented 
into the PRO-LEADS program and a single 
scoringfunction was used. Other recent exam¬ 
ples are the studies of Ha et al., who compared 
DOCK (using two different scoring functions) 
and FlexX (229), and, in the context of virtual 
screening, the work of Bissantz et al., who 
compared DOCK, FlexX, and GOLD together 
with seven different scoring functions (230) 
(cf. also Section 5.2 below). 

An unbiased test scenario is guaranteed if 
researchers are provided with a set of protein- 
ligand complexes of experimentally resolved, 
but yet unpublished structure. Two such blind 
trial competitions have been carried out so far 
(231, 232). A series of interesting issues re¬ 
garding docking tests and problems with true 
predictions have been amply discussed by 
Dixon (231) and participants in the CASP2 
docking competition (117,145,233, 234). Un¬ 
fortunately, the number of targets subjected 
to such blind tests has so far been rather 
scarce. A major limitation to such blind com¬ 
parisons is the availability of experimental 
data before publication. 

3.2.4 Docking and QSAR. As long as the 
problem of accurate binding free energy pre¬ 
diction on the basis of a given complex geom¬ 
etry has not been resolved (cf. section on scoring 
functions), computational methods establishing 
quantitative structure-activity relationships 
(QSARs) to estimate relative binding affinity 
differences within a set of ligands remain a 
pragmatic alternative. Both classical and 3D 
QSAR methods have been developed as ligand- 
based approaches (235-237). They rely exclu¬ 
sively on ligand information and try to corre- 
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late experimental binding data with features 
described by a set of relevant descriptors. In 
3D QSAR, such as CoMFA (Comparative Mo¬ 
lecular Field Analysis), these descriptors are 
essentially virtual interaction energies (van 
der Waals and coulombic), calculated using an 
appropriate probe atom placed at the intersec¬ 
tions of a regularly spaced grid surrounding 
the molecules. The model derived from differ¬ 
ences in the various interaction fields provides 
a quantitative spatial description of those mo¬ 
lecular properties that matter for binding. 
They can be interpreted as a surrogate repre¬ 
sentation of the binding site. Essential for the 
success of all 3D QSAR approaches is an ap¬ 
propriate alignment of the ligands: their rela¬ 
tive spatial superposition must reflect the dif¬ 
ferences in binding geometry also experienced 
at the binding site of the structurally un¬ 
known protein. Various strategies have been 
developed to achieve this goal (235, 236). In¬ 
creasingly, however, these methods are also 
applied if the receptor structure is known. 
This results in "receptor-based 3D QSAR," a 
combination of a ligand-based QSAR approach 
with information extracted from receptor 
structures (238).This additional information 
is used to generate a ligand alignment based 
on the experimental or predicted binding, 
mode of the ligands in the binding site. The 
standard 3D QSAR techniques are subse¬ 
quently used to derive a correlation model and 
to ultimately predict the binding affinity cf 
new, appropriately aligned ligands (239). As a 
practical advantage, receptor-based 3D QSAR 
provides important information as to which of 
the protein-ligand interactions are responsi¬ 
ble for the variance in biological activity 
among the given set of ligands. 

Obviously, in the case of known receptor 
structure, the ligand alignment can be ob¬ 
tained by docking. This strategy has indeed 
been followed in a variety of studies: it has 
been used to set up CoMFA models [e.g., (240)] 
or extended to the Comparative Binding En¬ 
ergy (COMBINE) analysis (241-244), that ex¬ 
plicitly exploits receptor information to gener¬ 
ate the QSAR descriptors. Furthermore, in a 
GRID/GOLPE (245) analysis, the model gen¬ 
erated with the docking alignment has been 
compared to the traditional CoMFA model 



305 


based on ligand alignment (238, 246); the 
alignment generated by docking could be 
shown to exhibit higher relevance. 

Another concept to combine docking with 
QSAR has recently been proposed by Vieth 
and Cummins in their DoMCoSAR approach 
(247). DoMCoSAR is used to statistically de¬ 
termine the docking mode that is consistent 
with a structure-activity relationship, based 
on the explicit assumption that all molecules 
exhibit the same binding mode. In a first step, 
all molecules of a chemical series with com- 
nm substructure are docked in an unbiased 
vvay to the protein binding site and the results 
are clustered to establish the most favorable 
docking modes for the common substructure. 
Subsequently, constrained docking is per¬ 
formed by forcing all molecules to align with 
the common substructure in the major dock¬ 
ing modes. In a final stage, interaction-en- 
ergy-based descriptors are calculated for all 
major docking modes. QSAR models are then 
derived to determine the statistically signifi¬ 
cant and most predictive set of descriptors and 
thus the docking mode that is most consistent 
wilh a given structure-activity relationship. 
As noted by the authors, the appeal of this 
method is that an objective statistical justifi¬ 
cation for the selection of a binding mode is 
obtained. This may especially prove useful in 
cases where the primary docking scores yield 
nearly degenerate multiple binding modes and 
aselection of the most representative result is 
difficult. However, because one alignment is 
rendered prominent among others for the 
sake of best agreement with the derived QSAR 
model, the danger exists that unconsidered or 
ill-defined descriptors in the QSAR could pos¬ 
sibly distort the final or accepted alignment. 




3.2.5 Docking and Homology Modeling. In 

the absence of an experimental protein struc¬ 
ture, a homology model may be used for dock¬ 
ing and structure-based design. Such a model 
can be generated by comparative modeling 
based on homologous proteins of known struc¬ 
ture. Obviously, it is most reliable in the re¬ 
gions cf highest homology between the tem¬ 
plates and the target protein. Although an 
overall skeleton of the target protein can fre¬ 
quently be obtained with sufficient accuracy, 
the structural details of the binding site are 


often beyond the scope of the method. In fact, 
members of a homologous protein family may 
show considerable differences in the binding 
region. Accordingly, homology models may 
not be sufficiently accurate to apply standard 
docking tools, and special methods addressing 
the docking of ligands to low-resolution struc¬ 
tures have been presented (248). 

Clearly, flexible-receptor docking could 
help to alleviate the problem. A frequently fol¬ 
lowed alternative is to refine the initial com¬ 
plex between the protein model and the li¬ 
gand, most commonly by relaxation with MD 
simulations (249-251).This may also be com¬ 
bined with free energy calculations to deter¬ 
mine the binding mode most consistent with 
experimental affinity data (252). However, re¬ 
finement does not overcome the problem that 
the initial conformation of the model may pre¬ 
clude the binding of certain ligands. This has, 
for example, been demonstrated by Schapira 
et al. in a virtual screening for retinoic acid 
receptor (RAR) antagonists based on an RAR 
homology model (201). The automatic selec¬ 
tion procedure based on flexible ligand dock¬ 
ing was followed by optimization of the se¬ 
lected candidates with flexible protein side 
chains using the ICM program (82,116,117). 
Nevertheless, some known ligands were re¬ 
peatedly missed by the screening algorithm 
because of incompatible binding site confor¬ 
mations. Consideration of side-chain flexibil¬ 
ity already in the initial docking simulation 
was required to accommodate these ligands. 

An approach developed especially for the 
purpose of docking ligands into approximate 
protein models generated by homology model¬ 
ing is the DragHome method (253). The bind¬ 
ing site is analyzed in terms of putative ligand 
interaction sites and translated using Gauss¬ 
ian functions into afunctional binding-site de¬ 
scription represented by physicochemical 
properties. Similarly, ligands are translated 
into a description based on Gaussian functions 
and the dockingis computed by optimizing the 
overlap between the two functional descrip¬ 
tions. The use of "soft" Gaussian functions to 
describe protein-ligand interactions is one 
possibility to take into account the limited ac¬ 
curacy of modeled structures for the purpose 
of docking. The method for generating and op- 
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timizing ligand orientations relative to the 
binding-site representation was adapted from 
the ligand alignment program SEAL 
(254-256). For a set of different ligands, the 
generated solutions are analyzed with respect 
to the mutual ligand alignment. This align¬ 
ment is then used to generate 3D QSAR mod¬ 
els, which in turn can be interpreted with re¬ 
spect to the surrounding protein model. This 
can highlight inconsistencies and deficiencies 
present in the model, and thus information 
which in future developments of the methods 
is planned to be fed back into a subsequent 
modeling step to improve the protein model. 
The idea behind this is that the cycle of dock¬ 
ing and alignment, ligand data analysis (3D 
QSAR), and protein structure modeling 
should be repeated until self-consistency is 
achieved. This would provide a protein homol¬ 
ogy model optimized with respect to the bind¬ 
ing site and suitable to obtain consistent dock¬ 
ing results. 

4 SCORING FUNCTIONS 

This section is dedicated to the scoring aspect 
of the docking problem. Various approaches 
are discussed that try to capture the essential 
elements of protein-ligand interactions in 
computationally efficient scoring functions. 
The discussion focuses on general approaches 
rather than individual functions. The reader 
is referred to Table 7.2 for original references 
to the most important scoring functions. 

41 Description of Scoring Functions for 
Protein-Ligand Interactions 

Reversible protein-ligand binding is an equi¬ 
librium between the bound state and the un¬ 
bound state of the binding partners. The rig¬ 
orous theoretical description requires full 
consideration of all species involved: the sepa¬ 
rate solvated protein, the separate solvated li¬ 
gand, and the solvated complex, in which the 
binding partners are partially desolvated and 
form interactions with each other. The quan¬ 
tity of interest to characterize this equilibrium 
is the free energy of binding. Its most accurate 
calculations are based on the evaluation of en¬ 
semble averages accordingto principles of sta¬ 
tistical mechanics (45). To obtain reasonably 
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accurate values of binding free energies, ex¬ 
tensive Monte Carlo or MD simulations are 
necessary, which require large computational 
resources. Clearly, this is impractical for stan¬ 
dard docking applications. Furthermore, even 
the most advanced techniques are reliable 
only for calculating binding free energy differ¬ 
ences between closely related ligands (162, 
163, 257, 258). However, some less rigorous, 
but faster and, as experience shows, often not 
less accurate methods have been developed, 
that are suitable to handle larger numbers cf 
ligands. For example, continuum solvation 
models are used to replace explicit solvent 
molecules at least in the final energy evalua¬ 
tion of the simulation trajectory (259), or lin¬ 
ear response theory is applied (260-262), 
sometimes augmented by a surface term 
(263). 

Scoring functions that can be evaluated 
fast enough to be applied in docking and vir¬ 
tual screening can only estimate the free en¬ 
ergy of binding. They usually take into ac¬ 
count only one possible configuration of the 
receptor-ligand complex and disregard ensem¬ 
ble averaging and explicit properties of the un¬ 
bound states of the binding partners. Further¬ 
more, all methods share the assumption that 
the free energy can be decomposed into a sum 
of terms (additivity). In a strict physical sense, 
this is not allowed, given that the free energy 
of binding is a state function, although its 
components are not (77,264).In addition, sim¬ 
ple additive models cannot describe subtle co- 
operativity effects (265).Nevertheless, it is of¬ 
ten useful to interpret receptor-ligand binding 
in an additive fashion (266-268), and esti¬ 
mates of binding free energy based on the ad¬ 
ditivity assumption are often accessible at 
very low computational cost. 

Three main classes of fast scoring functions 
can be distinguished: force field-based meth¬ 
ods, empirical scoring functions, and know¬ 
ledge-based methods. The following sections 
are dedicated to a separate discussion of each 
method. 

4.1.1 Force Field-Based Methods. An obvi¬ 
ous idea to circumvent parameterization ef¬ 
forts for scoring is to use nonbonded energies 
of existing, well-established molecular me¬ 
chanics force fields for the estimation of bind- 
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Table 7.2 Overview of Currently Used Scoring Functions 


Type of Function 

Name of Function 

Year 

Published 

Original 

References 

Selected 
References to 
Applications 

Force field 

Charmm 

1998 

(274) 


Force field + 

(Schapira, Abagyan et al.) 

1999 

(280) 


desolvation 






AMBER + desolvation 

1999 

(276) 



Charmm + PB 

1999 

(393) 



AMBER + desolvation 

1999 

(278) 



MM PB/SA 

1999 

(343) 

(344,346) 

Linear 

LIE 

1994 

(260) 

(261,263,394) 

response 





Simplified 

OWFEG Grid 

2001 

(284) 

(395) 

free-energy 





perturbation 





Empirical 

(Wade, Goodford et al.) 

1989,1993 

(220,298) 

GRID (218) 


SCORE1 

1994 

(294) 

LUDI (108,109); 





(300,396) 


(Miller, Sheridan et al.) 

1994 

(121) 

FLOG (121) 


GOLD score 

1995 

(176,177) 

GOLD (176,177) 


PLP 

1995,2000 

(185,367) 



FlexX score 

1996 

(110) 

FlexX (110,130, 





138,397) 


VALIDATE 

1996 

(307) 



(Jain) 

1996 

(297) 

Hammerhead 





(388) 


ChemScore 

1997 

(80) 

(295) 


SCORE2 

1998 

(296) 



(Takamatsu, Itai) 

1998 

(398) 



SCORE 

1998 

(293) 

(225) 


AutoDock3 

1998 

(115) 

(391) 


Fresno 

1999 

(399) 

(230) 


ScreenScore 

2001 

(287) 


Desolvation 

HINT 

1991 

(400) 

(308) 

terms 






(Zhang, DeLisi et al.) 

1997 

(305) 


Knowledge-based 

SMoG 

1996 

(313) 

SMoG (314) 


BLEEP 

1999 

(315,316) 

(340) 


PMF 

1999 

(317) 

(299,319, 320, 





339,369) 


DrugScore 

2000 

(226) 

(15,318) 


ing affinity. In doing so, one substitutes esti¬ 
mates of the free energy of binding in solution 
by an estimate of the gas phase enthalpy of 
binding. Even this crude approximation can 
lead to satisfying results. A good correlation 
was obtained between nonbonded interaction 
energies calculated with a modified MM2 force 
field and IC 50 values of 33 HIV-1 protease in¬ 
hibitors (269). Similar results were reported 
n a study of 32 thrombin-inhibitor complexes 
wilh the CHARMM force field (270). In both 


studies, however, experimental data repre¬ 
sented rather narrow activity ranges and cov¬ 
ered little structural variation. 

The AMBER (271, 272) and CHARMM 
(179) nonbonded terms are used as scoring 
function in several docking programs. As men¬ 
tioned above (Section 3.1.1), protein terms are 
usually precalculated on a rectangular grid to 
speed up the energy calculation compared to 
traditional atom-by-atom evaluations (273). 
Distance-dependent dielectric constants are 
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usually employed to approximate the long- 
range shielding of electrostatic interactions by 
water (274). However, compounds with high 
formal charges still obtain unreasonably high 
scores as a result of overestimated ionic inter¬ 
actions. For this reason, a common practice in 
virtual screening is to separate databases of 
compounds into subgroups according to their 
total charges and rank these groups sepa¬ 
rately. When electrostatic interactions are 
complemented by a solvation term calculated 
by the Poisson-Boltzmann equation (32) or 
faster continuum solvation models (e.g., Ref. 
275), effects of high formal charges are usually 
leveled out. In a validation study on three pro¬ 
tein targets, Shoichet and coworkers observed 
significantly improved ranking of known in¬ 
hibitors upon correction for ligand solvation 
(276).The current version of the docking pro¬ 
gram DOCK calculates solvation corrections 
based on the generalized Bom (277) solvation 
model (278).The method has been tested in a 
study where several peptide libraries were 
docked into various serine protease active 
sites (279). 

In the context of scoring, the van der Waals 
term of force fields is mainly responsible for 
penalizing docking solutions with respect to 
overlap between receptor and ligand atoms. It 
is often omitted when only the binding of ex¬ 
perimentally determined complex structures 
is analyzed (280-282). 

Very recently, a new contribution to the list 
of force-field-based scoring methods has been 
developed by Charifson and Pearlman. This 
so-called OWFEG (one window free energy 
grid) method (283) is an approximation to the 
expensive first-principles method of free en¬ 
ergy perturbation (FEP). For the purpose of 
scoring, an MD simulation is carried out with 
the ligand-free, solvated receptor site. During 
the simulation, the energetic effects of probe 
atoms on a regular grid are collected and av¬ 
eraged. Three simulations are run with three 
different probes: a neutral methyl-type atom, 
a negatively charged atom, and a positively 
charged atom. The resulting three grids con¬ 
tain information on the score contributions of 
neutral, positively, and negatively charged 
probe atoms located in various positions of the 
receptor site. They are used for scoring a li¬ 
gand position by linear interpolation based on 
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the partial charges of the ligand atoms. This 
approach seems to be successful for K j predic¬ 
tion as well as virtual screening applications 
(284). Its conceptual advantage is the implicit 
consideration of entropic and solvent effects 
and some protein flexibility. 

The calculation of ligand strain energy tra¬ 
ditionally also lies in the realm of molecular 
mechanics force fields. Although effects of 
strain energy have rarely been determined ex¬ 
perimentally (3), it is generally accepted that 
high-affinity ligands bind in low-energy con¬ 
formations (285, 286). If a compound must 
adopt a strained conformation to fit into a re¬ 
ceptor pocket, this should lead to a less nega¬ 
tive binding free energy. Strain energy can be 
estimated by calculating the difference be¬ 
tween the global energy minimum of the un¬ 
bound ligand and the current conformation of 
the ligand in the complex. However, force field 
estimates of energy differences between indi¬ 
vidual conformations are not reliable for all 
systems. In practice, better correlation with 
experimental binding data is observed when 
strain energy is used as a filter to weed out 
unlikely binding geometries rather than in¬ 
cluding it in the final score. Estimation of li¬ 
gand strain energy based on force fields can be 
time-consuming and therefore alternatives 
are often employed, such as empirical rules 
derived from small-molecule crystal data 
(140). Conformations generated by such pro¬ 
grams are, however, often not strain-free be¬ 
cause only one torsional angle is regarded at a 
time. Some strained conformations can be ex¬ 
cluded when two consecutive dihedral angles 
are taken into account simultaneously (287). 

4.1.2 Empirical Scoring Functions. The un¬ 
derlying idea of empirical scoring functions is 
that the binding free energy of a noncovalent 
receptor-ligand complex can be factorized into 
a sum of localized, chemically intuitive inter¬ 
actions. Such decompositions can be a useful 
tool to gain some insight into binding phenom¬ 
ena, even without analyzing 3D structures of 
receptor-ligand complexes. Andrews and col¬ 
leagues derived average functional group con¬ 
tributions to the binding free energy by ana¬ 
lyzing a set of 200 compounds for which the 
affinity to a receptor had been experimentally 
determined (266). Such average functional 
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group contributions can then be used to esti¬ 
mate the mean overall binding affinity of a 
compound independent of a particular binding 
site. This value can be compared to the exper¬ 
imental binding free energy: if the experimen¬ 
tal affinity is similar to or even more favorable 
than the computed one, the ligand obviously 
shows a good fit with the receptor and its func¬ 
tional groups are supposedly all involved in 
interactions with the protein; on the other 
hand, if it is significantly less favorable, the 
compound apparently does not fully exploit its 
potential to form optimal interactions. Simi¬ 
larly, experimental binding affinities have 
been analyzed on a per-atom basis in quest of 
the maximal binding affinity of noncovalent 
ligands (288). It was concluded that in the 
strongest binding ligands each non-hydrogen 
atom on average contributes 6.3 kj/mol to the 
binding energy. 

The analysis of binding phenomena can be 
performed with much more detail if the 3D 
structures of receptor-ligand complexes are 
available. Based on the assumption of additiv¬ 
ity, the binding affinity AG bind can be esti¬ 
mated as a sum of interactions multiplied by 
weighting factors: 

AG bind ~ 2 AG*/). 


Here, each /) corresponds to an interaction 
term that depends on structural features of 
the complex and each A G t represents a weight¬ 
ing coefficient, which is determined on the ba¬ 
sis of a training set of experimental affinities 
forcrystallographically known protein-ligand 
complexes. Scoring schemes that use this con¬ 
cept are called empirical scoring functions. 
Several reviews summarize details of individ¬ 
ual parameterizations (26, 44, 56, 289-292). 
The individual terms in empirical scoring 
functions are usually chosen such that they 
intuitively cover important contributions of 
the total binding free energy. Most empirical 
scoring functions are derived by evaluating 
the functions/) on a set of protein-ligand com¬ 
plexes and fitting the coefficients AG, to exper¬ 
imental binding affinities of these complexes 
by multiple linear regression or supervised 
learning. The relative weight of the individual 
contributions depends on the training set. 


Usually, between 50 and 100 complexes are 
used to derive the weighting factors. In a re¬ 
cent study it has been shown that many more 
than 100 complexes were necessary to achieve 
convergence (293).The reason for this finding 
is probably the fact that the publicly available 
protein-ligand complexes fall in a few rather 
strongly populated classes. 

Empirical scoring functions usually con¬ 
tain individual terms for hydrogen bonds, 
ionic interactions, hydrophobic interactions, 
and binding entropy. Hydrogen bonds are of¬ 
ten scored by simply counting the number of 
donor-acceptor pairs that fall into a given dis¬ 
tance and angle range favorable for hydrogen 
bonding, weighted by penalty functions for de¬ 
viations from ideal standard values (80, 294- 
296). The amount of error tolerance in these 
penalty functions is critical. If large deviations 
from the ideal are tolerated, the scoring func¬ 
tion cannot discriminate sufficiently between 
different placements of a ligand, whereas too 
stringent tolerances artificially score similar 
complexes rather differently. Attempts have 
been described to reduce the strong distance 
dependency of such interactions by assigning 
soft modulating functions on an atom-pair ba¬ 
sis (297). Other concepts try to avoid penalty 
functions and introduce distinct regression co¬ 
efficients for strong, medium, and weak hy¬ 
drogen bonds (293). The Agouron group has 
used a simple four-parameter potential that is 
a piecewise linear approximation of a potential 
neglecting angular terms ("PLP scoring func¬ 
tion") (185). Most functions consider all types 
of hydrogen bonds equivalently. Some at¬ 
tempts have been made to distinguish be¬ 
tween different donor-acceptor functional 
group pairs. Hydrogen bond scoring in GOLD 
(176,177) is based on a list of hydrogen bond 
energies, derived from ab initio calculations, 
for any combination of 12 donor and 6 accep¬ 
tor atom types. A similar differentiation of do¬ 
nor and acceptor groups is attempted in the 
program GRID (218) for the characterization 
of binding sites (219,220, 298). The consider¬ 
ation of such lookup tables in scoring func¬ 
tions might help to avoid false predictions 
originating from an oversimplification of some 
individual interactions. 

Reducing the weight of hydrogen bonds lo¬ 
calized at the solvent-exposed rim of a binding 
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site is a useful concept to avoid false positives 
in virtual screening. This is achieved by reduc¬ 
ing charges of surface-exposed residues in 
cases where explicit electrostatic terms are 
used (274) or by multiplying the hydrogen 
bond contribution with a factor that depends 
on the accessibility of the involved protein 
counter group (299). 

Ionic interactions are handled in a way sim¬ 
ilar to hydrogen bonds. Long-distance charge- 
charge interactions are usually neglected, and 
it is thus more appropriate to refer to salt 
bridges or charge-assisted hydrogen bonds. 
The scoring function by Boehm implemented 
in LUDI (294) assigns a stronger weight to salt 
bridges than to neutral hydrogen bonds. This 
differentiation generally proved successful in 
scoring series of thrombin inhibitors (295, 
300). However, comparable to force field scor¬ 
ing, the danger exists that highly charged mol¬ 
ecules receive overestimated scores. Experi¬ 
ence with FlexX containing a variant of 
Boehm’s scoring function has shown that 
more reliable predictions are obtained if 
charged and uncharged hydrogen bonds are 
handled equally in a virtual screening applica¬ 
tion. Similar experience has also been col¬ 
lected using the ChemScore function (80). 

Hydrophobic interactions are usually cali¬ 
brated to the size of the contact surface buried 
upon receptor-ligand complex formation. Of¬ 
ten, a reasonable correlation between experi¬ 
mental binding energies can be achieved con¬ 
sidering only a surface term [see, for example, 
(1,301, 302) and the discussion in Section 
2.I.J. Various approximations for such surface 
terms have been described, for example, as 
grid-based (294) or volume-based approaches 
(cf. the discussion in Ref. 115). Many functions 
are based on a distance-dependent summation 
over neighboring receptor-ligand atom pairs. 
Distance-dependent cutoffs have been intro¬ 
duced in various ways, either short (110) or 
longer to include atom pairs that are not in¬ 
volved in direct van der Waals contacts (80, 
185). The weighting factor A G c of the hydro- 
phobic term depends strongly on the training 
set. Supposedly, this fact has been underesti¬ 
mated in the development of many empirical 
scoring functions (35) because in most train¬ 
ing sets ligands composed of numerous donor 
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and acceptor groups are overrepresented 
(many peptide and carbohydrate fragments). 

In most empirical scoring functions, a hy¬ 
drophobic character is attributed to several 
atom types, with equivalent weight for all hy¬ 
drophobic contributions. In a more sophisti¬ 
cated approach, the propensity of particular 
atom types to be solvent-exposed or embedded 
in the interior of a protein can be assessed by 
so-called atomic solvation parameters. These 
have been derived, for example, from experi¬ 
mental octanol/water partition coefficients 
(303, 304) or from protein crystal structures 
(305, 306). Atomic solvation parameters are 
used in the VALIDATE scoring function (307) 
and have been tested in DOCK (308). 

Entropy terms account for the restriction 
of conformational degrees of freedom of the 
ligand upon complex formation. A crude but 
useful estimate of this entropy contribution is 
the number of rotatable bonds of a ligand (294, 
296). This measure has the advantage of being 
a function of the ligand only. More sophisti¬ 
cated estimates try to take into account the 
nature of the ligand portion on either side of a 
flexible bond, particularly with respect to the 
interactions formed with the receptor (80, 
307). This concept is based on the assumption 
that purely hydrophobic contacts allow for 
more residual motion in the ligand fragments. 

4.1.3 Knowledge-Based Methods. Empiri¬ 
cal scoring functions regard only those inter¬ 
actions that are explicitly part of the model. Less 
frequent interactions are usually neglected, 
even though they can be strong and specific, for 
example, NH-tt hydrogen bonds. To generate a 
comprehensive and consistent description of all 
these interactions in the framework of empirical 
scoring functions would be a difficult task. How¬ 
ever, the exponentially growing body of struc¬ 
tural data on receptor-ligand complexes can be 
exploited to discover favorable binding geome¬ 
tries. "Knowledge-based" scoring functions try 
to capture the knowledge about protein-ligand 
binding that is implicitly stored in the protein 
data bank by means of statistical analysis cf 
structural data, without referring to often in¬ 
consistent experimentally determined binding 
affinities (309). They are based on the concept cf 
the inverse formulation of the Boltzmann law, 
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Eij = -kT In (p ijk ) + kT In(Z), 

where the energy function E t j is called a poten¬ 
tial cf mean force for a state defined by the 
variables i, j, and k;p ijk is the corresponding 
probability density; and Z is the partition 
function. The second term of the sum is con¬ 
stant at constant temperature T and does not 
have to be regarded, given that Z = 1 can be 
selected by defining a suitable reference state, 
which leads to normalized probability densi- 
ti e s ~ -The inverse Boltzmann approach has 
been applied to assemble potentials from da¬ 
tabases of protein structures to score protein 
models in the context of protein structure pre¬ 
diction (310). To establish a function to score 
protein-ligand complexes, the variables i, j, 
and k are assigned to address protein and li¬ 
gand atom types, and their interatomic dis¬ 
tances. The occurrence frequency of individ¬ 
ual contacts is a measure of their energetic 
contribution to binding. If a specific contact 
occurs more frequently than expected by ran- 
dm or seen in an average distribution, it is 
assumed to be favorable. On the other hand, if 
it occurs less frequently, repulsive or unfavor¬ 
able interaction between two atom types is an¬ 
ticipated. The frequencies are thus converted 
into sets of atom-pair potentials ready for fur¬ 
ther evaluation. 

First applications in drug research (134, 
311,312) were restricted to small data sets of 
HIV protease-inhibitor complexes and did not 
result in generally applicable scoring func¬ 
tions. Recent publications (226, 313-318), 
however, have shown the usefulness of these 
approaches. The first general-purpose func¬ 
tion using such potentials was implemented in 
thede novo design program SMoG (313,314). 

The PMF function by Muegge and Martin 
(317), consists of a set of distance-dependentat- 
om-pair potentials E^(r) that are expressed as 

Eij(r) = - fcTln[//r)p y (r)/p y ]. 

Here, r is the atom pair distance, and p lJ (r) is 
the number density of pairs ij in a certain ra- 
cttus interval about r. This density is calcu¬ 
lated by the following procedure. First, a max¬ 
imum search radius is defined. This radius 
describes a reference sphere around each li- 
d atom j. Receptor atoms of type i are 


searched within this sphere. Subsequently, 
the sphere is subdivided into shells of a pre¬ 
defined thickness. The number of receptor 
atoms i matching each spherical shell is di¬ 
vided by the volume of this shell and averaged 
over all occurrences of ligand atoms j in the 
evaluated data set of protein-ligand com¬ 
plexes. The term p iJ in the denominator is the 
average density of receptor atoms i falling into 
the whole reference volume. It is argued that 
the spherical reference volume around each 
ligand atom needs to be corrected by eliminat¬ 
ing the occupied volume of the ligand itself, 
given that ligand-ligand interactions are not 
regarded in this area. This is done by a volume 
correction factor fj(r) that is a function of the 
ligand atom only and gives a rough estimate of 
the preference of an atom of type j to be ex¬ 
posed rather than buried in the ligand. 
Muegge could show that the volume-correc¬ 
tion factor contributes significantly to the pre¬ 
dictive power of the PMF function (319). Also, 
reference radii between 7 and 12 A are applied 
to implicitly include solvation effects, espe¬ 
cially the propensity of individual atom types 
to be located inside a protein cavity or in con¬ 
tact with solvent (320). To rank docking solu¬ 
tions, the PMF function is evaluated in a grid- 
based fashion and combined with a repulsive 
van der Waals potential at short distances. 

The DrugScore function by Gohlke et al. 
(226) is based on roughly the same formalism, 
albeit with several differences in the deriva¬ 
tion that lead to different potential forms. 
Most notably, the statistical distance distribu¬ 
tions p iJ (r)!p iJ for the individual atom pairs ij 
are divided by a common reference state that 
is taken as the average over the distance dis¬ 
tributions of all atom pairs p(r) = 2X p u (r)/ij. 
To consider only direct ligand-protein con¬ 
tacts, the upper sample radius has been set to 
6 A. At this distance, no further atoms (e.g., of 
a water molecule) can mediate a protein-li¬ 
gand interaction. The individual potentials 
have the form 

Etj{r) = -kT{\n\_p ij {r)lp ij ~\ - ln[p(r)]). 

These pair potentials are used in combination 
with potentials depending on single (protein or 
ligand) atom types that express the propensity 
of an atom type to be buried within a particular 
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protein environment on complex formation. 
Contributions of these surface potentials and 
the pair potentials are weighted equally in the 
final scoring function. This scoring function has 
initially been developed with the primary goal to 
differentiate between correctly docked (near na¬ 
tive) ligand poses versus decoy binding modes 
for the same protein-ligand pair. However, 
through appropriate scaling also quantitative 
estimates across different protein-ligand com¬ 
plexes are possible (318). 

Mitchell and coworkers choose a different 
type of reference state for their BLEEP poten¬ 
tial (315).The pair interaction energy is writ¬ 
ten as 

Eij(r) — kT ln[l + m lj cr] 

- kT ln[l + m ij <jp ij (r)/p(r)]. 

Here, the number density p lJ (r ) is defined as 
above, but it is normalized by the occurrence 
frequency of all atom pairs at this same dis¬ 
tance instead of by the number of pairs ij in 
the whole reference volume. The variable m lJ 
is the number of pairs ij found in the evaluated 
data set, and cr is an empirical factor that de¬ 
fines the weight of each observation. This po¬ 
tential is combined with a van der Waals po¬ 
tential as a reference state to compensate for 
the lack of sampling at short distances and for 
certain underrepresented atom pairs. 

Besides differences in the functional form 
and reference state, from a more practical 
point of view, the knowledge-based potentials 
differ also with respect to scope of atom type 
definitions and the amount of structural data 
used for their derivation. The number of dif¬ 
ferent atom types ranges from 17 in Drug- 
Score to 40 nonmetal atom types in BLEEP. In 
all cases, the Protein Data Bank (321) was the 
source of the solved crystal structures. For 
BLEEP 351 selected complexes were used, 
whereas the PMF function was extracted from 
697 complexes, and DrugScore was derived us¬ 
ing 1376 complexes. In the latter case, the data 
have been extracted from Relibase (322,323). 

4.2 Critical Assessment of Current 
Scoring Functions 

4.2.1 Influence of the Training Data. All 

fast scoring functions share a number of defi- 


Docking and Scoring Functions/Virtual Screening 

ciencies that one should be aware of in any 
application. First, most scoring functions are 
in some way fitted to or derived from experi¬ 
mental data. The functions necessarily reflect 
the accuracy of the data that were used for 
their derivation. For instance, a general prob¬ 
lem with empirical scoring functions is the 
fact that the experimental binding energies 
usually originate from many different sources 
and therefore consist of a rather heteroge¬ 
neous data set affected by all kinds of experi¬ 
mental errors. Furthermore, scoring func¬ 
tions mirror not only the quality but also the 
scope of experimental data used for their de¬ 
velopment. Virtually all scoring functions are 
still derived from data mostly based on high- 
affinity receptor-ligand complexes. Many cf 
these are still of peptidic nature, whereas in¬ 
teresting leads in pharmaceutical research are 
non-peptidic. This is reflected in the relatively 
high contributions of hydrogen bonds in the 
total score. The balance between hydrogen 
bonding and hydrophobic interactions is a 
very critical issue in scoring, and its conse¬ 
quences are especially obvious in virtual 
screening applications, as illustrated in Sec¬ 
tion 5.3. 

4.2.2 Molecular Size. The simple additive 
nature of most fast scoring functions oftQji 
leads to gradually increasing scores for mole¬ 
cules of larger size. Although it is true that 
small molecules with a molecular weight be¬ 
low 200-250 rarely show very high affinity, 
there is no physical reason why larger com¬ 
pounds should automatically possess higher 
activity. Comparing the scores of two com¬ 
pounds of significant size difference therefore 
calls for a term that compensates the size de¬ 
pendency. In some applications, a constant 
"penalty" term has been added to the score for 
each heavy atom (324) or a term proportional 
to the molecular weight has been considered 
(325). The empirical scoring function imple¬ 
mented in the docking program FLOG has 
been normalized to remove the linear depen¬ 
dency of the crude score on the number of li¬ 
gand atoms (121) .Originally introduced to im¬ 
prove the correlation between experimental 
and calculated affinities, entropy terms re¬ 
flecting the change in conformational mobility 
upon ligand binding also help to reduce an ex¬ 
cessive score for overly large and flexible mol- 



ecules (80,294). The size of the solvent-acces¬ 
sible surface of the ligand in its bound state 
can also be used as penalty term to discard 
large ligands not fully buried in the binding 
site. It should be noted, however, that all these 
approaches are very pragmatic in nature and 
do not solve the problem of size dependency, 
which is closely related to a proper under¬ 
standing of cooperativity effects (265). 


4.2.3 Penalty Terms. In general, scoring 
functions reward favorable interactions such 
as hydrogen bonds, but rarely penalize unfa¬ 
vorable ones. They are derived from experi¬ 
mentally determined crystal structures, and 
thus nonnative and energetically unfavorable 
orientations of a ligand within the binding site 
are not observed and can hardly be accounted 
for in a regression-based scoring function. 
Knowledge-based scoring functions try to cap¬ 
ture such effects by referring to a reference 
state that corresponds to a mean situation. At 
first glance, the neglect of angular terms in the 
compilation of knowledge-based scoring func¬ 
tions results in averaged pair potentials that 
may not discriminate sufficiently between dif¬ 
ferent binding geometries. However, some de¬ 
gree of angular dependency is considered, 
given that pair potentials for different atom 
types are always evaluated in combination 
with each other (226). Obvious deficiencies in 
regression-based scoring functions, such as 
electrostatic repulsions and steric clashes, can 
be avoided by defining reasonable penalty 
terms or by importing them from molecular 
mechanics force fields. This has been realized 
in the "chemical scoring" function imple¬ 
mented in the docking program DOCK (106, 
126 , 137 , 273 , 326 , 327 ), which is a modified 
van der Waals potential being attractive or re¬ 
pulsive between individual groups of donor, 
"acceptor, and lipophilic receptor and ligand at- 
Ibms ( 224 , 328 ). Other situations cannot be 
[ avoided by simple "clash" terms, but require a 
[-more sophisticated analysis of binding geome¬ 
try. Among these are incomplete steric filling 
(of the binding cavity by a ligand within the 
avity, an unreasonably large ligand surface 
sa remaining solvent exposed in the com¬ 
plex or the formation of voids at the receptor- 
id interface. Possible approaches to re- 
ptolve these shortcomings are empirical filters 
that detect such unsatisfactory solutions and 


remove them according to user-specified 
thresholds (329). A promising approach to 
properly reflect such cases is the inclusion of 
artificially generated, erroneous, decoy solu¬ 
tions in the optimization of scoring functions 
as reported for the scoring function of a flexi¬ 
ble ligand superposition algorithm (330,331). 

4.2.4 Specific Attractive Interactions. An¬ 
other general deficiency of scoring functions is 
the simplified description of attractive inter¬ 
actions. Molecular recognition is not entirely 
based on hydrogen bonding and hydrophobic 
contacts. Especially in host-guest chemistry, 
other specific types of interactions are fre¬ 
quently used to characterize the observed phe¬ 
nomena. For example, hydrogen bonds are 
formed between acidic protons and x-systems 
(332). These bonds can substitute for conven¬ 
tional hydrogen bonds in strength and speci¬ 
ficity, as has been noted in protein-DNA rec¬ 
ognition (333). Another type of less frequently 
observed interactions is the cation-T interac¬ 
tion, which is especially important at the sur¬ 
face of proteins (39, 334). Current empirical 
scoring functions usually neglect these inter¬ 
actions. Similarly, the directionality of inter¬ 
actions between aromatic rings is hardly con¬ 
sidered (335, 336). Because of the regression- 
type adjustment, some energy contributions 
originating from these interactions are al¬ 
ready implicitly incorporated into the conven¬ 
tional interaction terms. This might be one 
explanation why hydrogen bond contributions 
are traditionally overestimated in regression- 
based scoring functions. Knowledge-based ap¬ 
proaches automatically incorporate these in¬ 
teractions in a scoring function, provided they 
occur with reasonable frequency in the data 
set used to develop the potentials. 

4.2.5 Water Structure and Protonation 
State. Uncertainties about protonation states 
and the involvement of water in ligand bind¬ 
ing further complicate scoring. These consid¬ 
erations are relevant for the development as 
well as the application of scoring functions. As 
mentioned above, the entropic and enthalpic 
contributions involving the reorganization of 
water molecules upon ligand binding are very 
difficult to predict (see, for example, Ref. 337). 
Currently, the most pragmatic approach to 
handle the water problem is the elucidation of 
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"conserved water molecules" and to consider 
them as part of the receptor. A knowledge- 
based tool to estimate the "conservation" of 
water molecules upon ligand binding has been 
developed ( 217 ) and incorporated into a dock¬ 
ing procedure (lll)(cf. Section 3.2.2). It is 
based on crystallographic information and 
tries to extract rules about water sites by an¬ 
alyzing whether they are recurrently occupied 
by water molecules in series of related pro¬ 
tein-ligand complexes. 

Scoring functions require predefined atom 
types for each protein and ligand atom. This 
also implies the fixed assignment of a proton¬ 
ation state to each acidic and basic group. 
Knowledge-based functions, which do not con¬ 
sider hydrogen atoms, are equally affected by 
the problem because the atom type definitions 
normally imply a certain protonation state. 
Presently, such estimates might be reliable 
enough for the situation in aqueous solution; 
however, significant piif a shifts are possible 
upon ligand binding (338) as a result of strong 
changes of the local dielectric conditions. They 
give rise to protonation reactions in parallel to 
the binding process. With respect to scoring, 
switching from a donor to an acceptor func¬ 
tionality because of altered protonation states 
has important consequences (279). Accord¬ 
ingly, improved docking and scoring algo¬ 
rithms must incorporate a more detailed and 
flexible description of protonation states. 

4.2.6 Performance in Structure Prediction 
and Rank Ordering of Related Ligands. Similar 
to the broad range of available docking tools 
(cf. Section 3.2.3), the multitude of different 
scoring schemes calls for an objective assess¬ 
ment to evaluate their scope and limitations. 
This depends in part on the anticipated appli¬ 
cation; that is, whether protein-ligand com¬ 
plexes should be predicted (using the scoring 
scheme as objective function in docking), 
whether a set of ligands should be ranked with 
respect to one target protein (K i prediction), or 
whether the scoring function is used to select 
possible hits out of a large database of candi¬ 
date molecules (virtual screening). 

An objective assessment of the available 
scoring functions is difficult because only very 
few functions have been tested on the same 
data sets or with the same docking tool. For 
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structure prediction, several studies have 
shown that knowledge-based scoring func¬ 
tions are at least equivalent to regression- 
based functions. The PMF function has been 
successfully applied to structure prediction of 
inhibitors of neuraminidase (339) and MMP3 
(229) in combination with the program 
DOCK, yielding superior results to the DOCK 
force field and chemical scoring. The Drug- 
Score function was tested on a large set of PDB 
complexes and gave significantly better re¬ 
sults than those of the original FlexX scoring 
function using solutions generated by FlexX 
as the docking engine. DrugScore performed 
similarly to the force field score in DOCK, but 
outperformed the chemical scoring (226). 
Moreover, with respect to the correlation be¬ 
tween experimental and calculated binding 
energies, very promising results have been ob¬ 
tained with DrugScore (318) and PMF (229, 
317, 319, 339 ). BLEEP has recently been 
tested for scoring docked protein-ligand com¬ 
plexes ( 340 ). It was found to be slightly better 
than the DOCK energy function in discrimi¬ 
nating decoy situations from near-native bind¬ 
ing modes. 

Although in many docking programs the 
same function is applied as an objective func¬ 
tion for structure generation and for energy 
evaluation, better results can sometimes |De 
obtained if different functions are applied. In 
particular, the docking objective function can 
be adapted to the docking algorithm used. In a 
parameter study, Vieth et al. found that using 
a soft-core van der Waals potential made their 
MD-based docking algorithm more efficient 
(274). Using FlexX as the docking engine, we 
observed that the original FlexX scoring func¬ 
tion emphasizes directional interactions 
(mostly hydrogen bonds) in the dockingphase. 
Subsequently, the ranking of individual li¬ 
brary entries can be done successfully with a 
simple PLP potential that lacks directional 
terms, but considers general steric fit of recep¬ 
tor and ligand. Results are significantly worse 
if PLP is used already in the incremental 
built-up procedure of the docked ligand. 

It is even more difficult to draw valid con¬ 
clusions about the relative performance of 
scoring functions to rank sets of inhibitors 
with respect to their binding affinity for the 
same target. First, there is hardly any pub- 
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lished study in which different functions have 
been applied to the same data sets. Second, 
experimental data are often not measured un¬ 
do- the same conditions but collected from 
various literature references. This retrieval 
liom various sources usually implies larger 
uncertainties within the experimental data 

The task of ranking sets of 10-100 related 
ligands with respect to one target can also be 
handled by computationally more demanding 
methods. The most general approaches are 
probably force field scores complemented by 
electrostatic desolvation and surface area 
terms. An example is the MM-PBSA method 
that combines Poisson-Boltzmann electro¬ 
statics with AMBER molecular mechanics cal¬ 
culations and MD simulations (341,342). This 
melhod has recently been applied to an in¬ 
creasing number of examples, showing quite 
promising results (343-346). Poisson-Boltz¬ 
mann calculations have been performed on a 
variety of targets with many related computa¬ 
tional protocols (280-282, 347-350). Alterna¬ 
tively, extended linear response protocols 
(263) can be used. The OWFEG grid method 
by Pearlman has also shown very promising 


VIRTUAL SCREENING 





As outlined in section, virtual screening is a 
multistep process. Although, in principle, the 
whole process can be fully automated, it is 
highly advisable to allow for manual interven¬ 
tions, in that visual inspection and selection 
still play a major role. 

The process usually starts with a detailed 
analysis of the available 3D protein struc¬ 
tures. If possible, highly homologous struc- 
{tureswill also be analyzed, either to generate 
additional ideas about possible ligand struc¬ 
tural motifs or to gain some insight on how to 
achieve selectivity against other proteins of 
the same class. A superposition of different 
•protein-ligand complexes provides some ideas 
about key interactions repeatedly found in 
tight-binding protein-ligand complexes. Such 
|n overlay will also highlight flexible parts of 
tile protein or recurring water molecules in 


the binding site that could be included in the 
docking process. Tools such as Relibase (322, 
323) may be used to perform these compara¬ 
tive analyses of protein-ligand complexes in 
an efficient way. Subsequently, programs like 
GRID (218), LUDI (108, 109), Superstar (351, 
352), or DrugScore (318) are used to visualize 
potential binding sites ("hotspots") in the ac¬ 
tive site; in principle, any scoring function 
could be used for this purpose. 

An important result of the 3D structure 
analysis is usually the identification of one or 
more key interactions that all ligands should 
satisfy. In aspartic proteases, for example, in¬ 
hibitors should form at least one hydrogen 
bond to the catalytic Asp side-chains, whereas 
in metalloproteinases a coordination to the 
metal seems mandatory. Sometimes, a known 
ligand portion is used as initial scaffold based 
on which virtual screening techniques search 
for optimal side-chains. In principle, this step 
is not required, and instead one could fully 
rely on the docking and scoring step. However, 
following a pragmatic approach, it is impor¬ 
tant to use any well-founded information that 
is available about the system under consider¬ 
ation because more valuable results can usu¬ 
ally be expected this way. 

Once a reasonable hypothesis about the 
binding-site requirements has been gener¬ 
ated, the next level of virtual screening is ap¬ 
proached. Whether databases of commercially 
available compounds or "virtual" libraries of 
designed compounds are screened, it is advis¬ 
able not to dock every possible compound, but 
only those that pass a series of hierarchical 
filters (cf. also Fig. 7.3). Simple preliminary 
filters remove 

• compounds with reactive groups such as 
-S0 2 C1 or -CHO because they are expected 
to cause problems in some biological assays 
as a result of unspecific covalent binding to 
the protein. 

• compounds with molecular weights below 
150 or above 500. Small molecules such as 
benzene are known to bind to proteins 
rather unspecific ally at several sites. Large 
molecules such as polypeptides are difficult 
to optimize subsequently, given that good 
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Figure 7.3. Hierarchical filtering process in virtual screening for carbonic anhydrase inhibitors. 


bioavailability is usually limited to com¬ 
pounds with molecular weights below 500. 
compounds termed as “non-drug-like” ac¬ 
cording to criteria extracted from known 
drugs (353,354). 

After this general preselection, it can be ad¬ 
vantageous to apply further steps of hierarchi¬ 
cal filtering. As mentioned above, this could 
involve the selection of functional groups in¬ 
evitably required to anchor a ligand to the 
most prominent interaction sites. Subse¬ 
quently, the information of the "hot spot" 
analysis — translated into a pharmacophore 
hypothesis — can be used as matching crite¬ 
rion for a fast database screen. Such tools ei¬ 
ther involve fast tweak searching (355)or scan 
over precalculated conformers of the candi¬ 
date molecules (356). The list of prospective 


hits can then be submitted to a similarity 
search using information about already 
known active compounds, which could either 
be ligands already structurally characterized 
by crystallography or hits from a complemen¬ 
tary HTS study. This optional step of the an¬ 
alysis tries to incorporate all available infor¬ 
mation about known hits and produces a 
reranking of the candidate molecules to be 
submitted to docking. As tools for the spatial 
similarity analysis, Feature Trees (357, 358), 
FlexS (330), and SEAL (254-256) have all 
been successfully applied. 

All remaining ligands are docked into the tar¬ 
get protein and a list of some hundred to several 
thousand small molecules, each with a com¬ 
puted score, is produced. These have to be fur¬ 
ther analyzed to discard undesirable structures. 
Selection criteria could be any of the following: 
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o Lipophilicity (if not addressed before). 
Highly lipophilic molecules are difficult to 
test because of their low solubility. 

o Structural class. If 50% of the docked struc¬ 
tures belong to one single chemical class, it 
is probably not necessary to test all of them 
(359). 

♦ Unreasonable docked binding mode. Fast 
docking tools cannot produce reliable solu¬ 
tions for all compounds; often, even some 
well-scoring compounds are simply docked 
to the outer surface of the protein or adopt 
rather strained conformations to achieve 
good surface complementarity within the 
binding pocket. Computational filters help 
to detect such situations (329). 

Finally, the selected compounds are ordered 
cr synthesized and then tested. If the goal is to 
identify even weakly binding ligands as first 
leads, sufficient sensitivity of the biological as¬ 
say has to be ensured [cf., for example, Ref. 161. 
In this context it has also to be considered that 
limited solubility of the hits in water or water/ 
nVBO mixtures often hampers affinity deter¬ 
minations at high concentrations. 

Successful virtual screening has to produce 
aset cf compounds significantly enriched with 
active compounds compared to random selec¬ 
tion. A key parameter to assess the perfor¬ 
mance of docking and scoring in virtual 
screening is therefore, at least in theoretical 
case studies, the so-called enrichment factor. 
It is simply the ratio of active compounds in 
the subset selected by docking divided by the 
number of active compounds found in a ran- 
dctnly selected subset of equal size. To record 
such enrichment factors also for controlling 
performance at the various filter steps, a set of 
known active compounds is mixed with the set 
of candidate molecules. This strategy, how¬ 
ever, requires a set of reasonable size (e.g., 
30-50 ligands), which is not always given in a 
real-life virtual screening study. Further- 
moie, enrichment factors are far from being 
ideal indicators, particularly at later filter 
steps where a (hopefully) increasing amount 
of active compounds detected among the en¬ 
tries cf the database competes with the set of 
known active ligands and artificially lowers 
the enrichment factor. 


5.1 Combinatorial Docking 

Docking of large compound collections re¬ 
quires fast algorithms. If the collection is an 
unstructured library of more or less unrelated 
compounds, each individual molecule must be 
docked independently ("sequential docking"), 
and only the fastest docking methods are ap¬ 
plicable in this context, unless massive com¬ 
puter resources are used, as in the Dock- 
Crunch project based on the PRO-LEADS 
program (360).Examplesfor such fast docking 
tools are SLIDE (lll)or the docking method 
by Diller and Merz (112). Both have been de¬ 
veloped for database screening and library pri¬ 
oritization. Before docking, it is generally ad¬ 
visable to eliminate compounds that would 
provide only redundant information (similar¬ 
ity filters) or are very unlikely to yield high 
scores. Clearly, the filter routines need to be 
faster than the docking and scoring procedure, 
but this is normallv the case. 

V 

Complementary to initial filtering, a preor¬ 
ganization of compounds into families exhib¬ 
iting some kind of similarity has been demon¬ 
strated to improve the results of database 
screening. In the strategy shown by Su et al. 
(359), all molecules of any family are docked 
and scored, but only the best-scoring member 
of a high-ranking family is allowed to remaip 
in the final hit list, whereas the scores of re¬ 
lated molecules are recorded as annotations to 
this representative family member. This in¬ 
creases the diversity of the hit list and helps to 
identify a higher number of different classes of 
potential ligands. 

An alternative to sequential docking can be 
followed if combinatorial libraries are evalu¬ 
ated. Quite a few programs have been specifi¬ 
cally designed for speed-up by so-called com¬ 
binatorial docking. They profit from the 
structured, incremental nature of combinato¬ 
rial libraries and the fact that molecules of a 
combinatorial library consist of a common 
core. This core is assumed to form common 
specific interactions with the receptor (possi¬ 
bly supported by experimental evidence) and 
can thus be prepositioned in the binding 
pocket in one or a few similar orientations. It 
then serves as skeleton for the addition of sub¬ 
stituents. Obviously, this step is ideally suited 
for incremental construction algorithms (361) 
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and significantly reduces the complexity of the 
docking problem, limiting the required com¬ 
putation time per ligand. Earlier examples of 
this combinatorial docking approach are 
PRO_SELECT (362) and CombiDOCK (324). 
The latter is based on the DOCK program and 
has recently been enhanced by a vector-based 
orientation filter, to ensure productive scaf¬ 
fold poses, and by a free-energy-based scoring 
procedure (279). Another recent combinato¬ 
rial docking procedure has been implemented 
as FlexX c extension in FlexX (363). It follows a 
recursive scheme to traverse the combinato¬ 
rial library space efficiently. The algorithm is 
based on a tree data structure that allows the 
efficient reuse of previously calculated dock¬ 
ing results. HexX c follows the library search 
tree in a depth-first manner, whereas Combi¬ 
DOCK uses a breadth-first approach to evalu¬ 
ate fragments attached to a scaffold. A general 
advantage of breadth-first searches is that 
they allow for an efficient pruning of the 
search tree based on the scoring values. 

De novo design tools have also been adapted 
to the problem of combinatorial docking and 
combinatorial library design. The program 
LUDI, for example, has been enhanced by the 
ability to connect building blocks in a chemi¬ 
cally and structurally adequate manner; it can 
thus be used for combinatorial docking by fit¬ 
ting building blocks onto the interaction sites 
and simultaneous linking to previously docked 
core fragments (300). It has been successfully 
applied in the design of new thrombin inhibi¬ 
tors accessible through a single reaction. An¬ 
other example is a variant of the Builder pro¬ 
gram (364) that was used to select substituents 
for a library of cathepsin D inhibitors (12). Yet 
another approach is DREAM++, a suite of pro¬ 
grams for the design of virtual combinatorial li¬ 
braries (365).Here, the DOCK algorithm is used 
for the molecular placement. Variable frag¬ 
ments are joined consecutively in compliance 
with predefined types of well-characterized or¬ 
ganic reactions. Speed-up is achieved by pre¬ 
serving ("inheriting") information about com¬ 
mon partial structures across different 
reactions, such that only the conformations cf 
newly added fragments are searched. 

Generally speaking, combinatorial docking 
approaches work best in cases where a core 
fragment plays a dominant role in the binding 
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process and can be placed with high confidence 
in a well-characterized specificity pocket, such 
as the SI pocket in thrombin. A further issue 
to consider is mutual fragment dependencies, 
that is, when multiple fragments are hooked 
up to a scaffold in a sequential manner; the 
results can depend on the sequence by which 
they are added (see, for example, Ref. 363). 
Thus, in unfavorable cases, different orders cf 
attachment have to be followed to circumvent 
this possible limitation. 

5.2 Seeding Experiments to Assess Docking 
and Scoring in Virtual Screening 

True enrichment factors can be calculated 
only if experimental data are available for the 
full library, although such situations are un¬ 
usual. Accordingly, studies using enrichment 
factors as a figure-of-merit to assess the per¬ 
formance of a virtual screening can serve for 
theoretical validation purposes only. Several 
authors have tested the predictive ability cf 
docking and scoring tools by compiling an ar¬ 
bitrary set of diverse, drug-like compounds 
complemented by a number of known active 
compounds. This "seeded" library is then sub¬ 
jected to the virtual screening, and for the pur¬ 
pose of assessment it is assumed that the 
added active compounds are the only true ac¬ 
tives in the library. Clearly, this is a rather 
questionable assumption. 

Several seeding experiments have been 
published. An example has been performed at 
Merck using FLOG (121). A library consisting 
of 10,000 compounds including inhibitors cf 
various types of proteases and HIV protease 
was docked into the active site of HIV pro¬ 
tease. This resulted in excellent enrichment of 
the HIV protease inhibitors: all inhibitors but 
one were among the top 500 library members. 
However, inhibitors of other proteases were 
also considerably enriched (366). 

S eedingexperiments also allow for compari¬ 
sons of different docking and scoring proce¬ 
dures, as shown, for example, by Charifson et 
al. (86), Bissantz et al. (230), and Stahl and 
Rarey (287). Charifson et al. compiled sets cf 
several hundred active molecules for three dif¬ 
ferent targets, p38 MAP kinase, inosine mono¬ 
phosphate dehydrogenase, and HIV protease. 
These were docked into the corresponding ac¬ 
tive sites together with 10,000 randomly se- 



lected, but drug-like, commercially available 
compounds using DOCK (327) and the Vertex 
in-house tool Gambler. ChemScore (80, 188), 
Ihe DOCK AMBER force field score, and PLP 
(185) performed consistently well in enriching 
active compounds. This result was partially 
attributed to the fact that a rigid-body optimi¬ 
zation could be carried out with these func¬ 
tions because they include repulsive terms in 
contrast to many other tested functions. Stahl 
and Rarey compared DrugScore (226), PMF 
(317), PLP (185), and the original FlexX score 
using FlexX for docking (110,130,138). Inter¬ 
estingly, the two knowledge-based scoring 
functions performed differently. DrugScore 
achieved better ranking for the tight-binding 
ligands in narrow lipophilic cavities of COX-2 
and the thrombin SI pocket. In contrast, PMF 
obtained better enrichment for the case of the 
vay polar binding site of neuraminidase. Ob¬ 
viously, a general strength of PMF is the de¬ 
scription of complexes showing multiple hy¬ 
drogen bonds. This has also been noted in the 
study by Bissantz et al., in which PMF was 
found to perform well for the polar target thy¬ 
midine kinase and less well for the estrogen 
receptor (230). 

5.3 Hydrogen Bonding versus Hydrophobic 
Interactions 

A balanced description of the contribution of 
hydrogen bonding and hydrophobic interac¬ 
tions to the total score is of general impor¬ 
tance, to avoid a bias toward either highly po¬ 
lar or completely hydrophobic molecules. The 
actual parameterization of a scoring function 
depends on the compilation of the data set 
used to develop the function. Empirical scor¬ 
ing functions are more likely affected by the 
data set composition used for parameteriza¬ 
tion, but can be quickly reparameterized. In 
the case of knowledge-based functions such a 
readjustment is more difficult to perform; 
however, because of the much larger data¬ 
bases used for their development, they are 
supposed to be less dependent on special data 
set compilations. 

The PLP function, for example, addresses 
general steric complementarity and hydro- 
phobic interactions based on rather long- 
range pair potentials, whereas the FlexX score 
concentrates on hydrogen-bond complemen¬ 


tarity. This is clearly reflected in results of 
database-ranking experiments. To combine 
the virtues of both scoring functions and to 
construct a more robust general function, a 
combination of PLP and FlexX called Screen- 
Score has recently been published (287). It 
was derived by a systematic optimization of 
library ranking results over seven targets and 
covers a wide range of active sites with respect 
to form, size, and polarity. ScreenScore ob¬ 
tains good enrichments for COX-2 (highly li¬ 
pophilic binding site) and neuraminidase 
(highly polar site), whereas the individual 
functions fail in one of the two cases. The au¬ 
thors of PLP have recently enhanced their 
scoring function by including directed hydro¬ 
gen bonding terms (367). Similar to Screen- 
Score, this could also lead to a more robust 
scoring function. 

5.4 Finding Weak Inhibitors 

Seeding experiments are often carried out 
with a small number of active compounds that 
are already optimized for binding to the stud¬ 
ied target. Enrichment factors based on the 
retrieval of these compounds are not very con¬ 
clusive because the recovery of potent inhibi¬ 
tors from a large set of candidate molecules is 
significantly easier than the discovery of new, 
but usually rather weak inhibitors from a 
large majority of nonbinders. In general, as in 
HTS, one can only expect hits from virtual 
screening that bind in the low micromolar 
range. 

Nevertheless, a recent study showed that 
library screening can also successfully detect 
very weak ligands. Approximately 4000 com¬ 
mercially available compounds had been 
screened for FKBP-binding by means of the 
SAR-by-NMR technique (368) and 31 com¬ 
pounds with activity in the low millimolar 
range were detected. This set of compounds 
was flexibly docked into the FKBP binding site 
using DOCK 4.0 with the PMF scoring func¬ 
tion (369). Interestingly, significant enrich¬ 
ment factors of 2 to 3 were achieved, whereas 
scoring with the standard AMBER score of 
DOCK did not really provide an enrichment. 

5.5 Consensus Scoring 

Different scoring schemes focus on different 
aspects as most important contributions to 
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binding. However, these differences do not 
necessarily become obvious when calculating 
binding affinities of known active compounds. 
In contrast, the scoring of non-active com¬ 
pounds could unravel such differences. Vertex 
has reported good experience with so-called 
consensus scoring. Here, docking results are 
scored by several distinct functions and only 
those hits are considered that are rendered 
prominent by several of the functions. A sig¬ 
nificant decrease in false positives has been 
described (86), but inevitably a number of true 
positives is lost (see, for example, Ref. 230). 

When consensus scoring is applied, one 
should thus keep in mind that, although the 
number of false positives can be reduced, the 
danger exists to discard some active com¬ 
pounds highlighted by only one of the scoring 
functions. This would, for example, apply to 
the above-mentioned PLP and FlexX scoring 
functions, which emphasize different aspects 
of ligand binding. Here, consensus scoring 
could be counterproductive. Therefore, along 
with consensus scoring, the individual scoring 
results should be consulted. Generally, how¬ 
ever, it appears that one can expect more ro¬ 
bust results from consensus scoring. 

5.6 Successful Identification of Novel Leads 
through Virtual Screening 

A considerable number of publications have 
proved that virtual screening can be efficiently 
used to discover novel leads (11,13,142, 370- 
375). Some of the most recent examples are 
briefly presented in the following. 

The program ICM has been used to identify 
novel antagonists for a nuclear hormone re¬ 
ceptor (201)and, together with DOCK, to find 
inhibitors for the RNA transactivation re¬ 
sponse element (TAR)of HIV-1 (25).The vir¬ 
tual screening protocol started with 153,000 
compounds from the Available Chemicals Di¬ 
rectory (ACD) (376) and involved increasingly 
elaborate docking and scoring schemes as the 
screening proceeded toward smaller selections 
of compounds. In the HIV-1 TAR study, the 
ACD library was first rigidly docked into the 
binding site using the DOCK program along 
with a simple contact scoring scheme. Then, 
20% of the best-scoring compounds were sub¬ 
jected to flexible docking with ICM and an em¬ 
pirical scoring function specifically tailored to 
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RNA targets, providing a selection of approxi¬ 
mately 5000 compounds. This was followed by 
two additional steps involving longer sam¬ 
pling of conformational space to retrieve 350 
most promising candidates. Of these, a very 
small fraction was tested experimentally and 
two compounds were found to significantly re¬ 
duce the binding of the Tat protein to HIV-1 
TAR (CD sn ~ 1 jiM). 

Recently, Grueneberget al. discovered sub¬ 
nanomolar inhibitors of carbonic anhydrase II 
by virtual screening (15). The study was per¬ 
formed following a protocol of several consec¬ 
utive steps of hierarchical filtering (Fig. 7.3). 
Carbonic anhydrase II is a metalloenzyme 
used as prominent target for the treatment of 
glaucoma. Its binding site is a rather rigid, 
funnel-shaped pocket. Known inhibitors such 
as dorzolamidebind to the catalytic zinc ion by 
a sulfonamide group. In a recent crystallo¬ 
graphic study it could be demonstrated that 
only the sulfonamide group represents an 
ideal anchor for zinc coordination (377). An 
initial data set of 90,000 entries from the May- 
bridge (378) and LeadQuest (379) libraries 
was converted to 3D structures with Corina 
(380). In a first filtering step, compounds were 
requested to possess a known zinc-binding 
group. These compounds were then processed 
through UNITY (355)using a protein-derived 
pharmacophore query. The pharmacophore 
hypothesis had been constructed from a "hot 
spot" analysis of the available X-ray struc¬ 
tures of the enzyme. This yielded a set of 3314 
compounds. In a subsequent filtering step, the 
known inhibitor dorzolamide was used as a 
template onto which all potential candidates 
were flexibly superimposed by means of the 
program FlexS (330). The top-ranking com¬ 
pounds from this step were docked into the 
binding site with FlexX (110,130,138), taking 
into account four conserved water molecules 
in the active site. After visual inspection, 13 
top-ranking hits were selected for experimen¬ 
tal testing. Nine of these compounds showed 
activities below 1 (iM, and three had K { values 
below 1 txM, Two of the hits were also exam¬ 
ined crystallographically. The docking solu¬ 
tion predicted as best by DrugScore was found 
to be closer to the experimental structure than 
the one predicted by the FlexX score. 



6 Outlook 


321 


This strategy of hierarchical filtering start¬ 
ing with a mapping of candidate molecules 
onto a binding site-derived pharmacophore, 
followed by a similarity analysis with known 
ligands using either FlexS (3301, SEAL (254- 
256), or FeatureTrees (357, 358); and con¬ 
cluded by flexible docking with FlexX, which 
meanwhile was applied to three other proteins 
in the same laboratory. For t-RNA guanine 
transglycosylase, thermolysin, and aldose re¬ 
ductase, novel micromolar to submicromolar 
lead structures could be discovered. Most chal¬ 
lenging in this context is aldose reductase be¬ 
cause it performs pronounced induced fit 
changes upon ligand binding. Crystal struc¬ 
ture analysis of a micromolar hit retrieved by 
virtual screening clearly revealed known and 
new areas of induced fit adaptation. The crys¬ 
tal structure obtained with this hit provides a 
gDod starting point for further lead optimiza¬ 
tion. 

The de novo design of inhibitors of the bac¬ 
terial enzyme DNA gyrase, a well-established 
antibacterial target (381), is another example 
fcr successful structure-based virtual screen¬ 
ing, reported by Roche (16). HTS performed 
cn the proprietary compound library provided 
no suitable lead structures. Therefore, a new 
rational approach was developed to discover 
potential lead structures using structural in¬ 
formation of the ATP binding site in subunit B 
of the enzyme. At the onset of the project, the 
crystal structures of DNA gyrase subunit B 
complexed with a substrate analog and two 
inhibitors were available. In the buried part of 
the pocket they all donate a hydrogen bond to 
an aspartic acid side-chain and accept one 
firm a conserved water molecule. As a design 
concept, the formation of these two key hydro¬ 
gen bonds has been defined as mandatory. As 
an additional requirement, a lipophilic portion 
forming hydrophobic interactions with the en¬ 
zyme was demanded. A new assay was estab¬ 
lished to allow for the detection of weakly 
binding inhibitors. A computational search of 
the ACD (376) and the Roche Compound In¬ 
ventory identified hits having low molecular 
weights and matching the above-mentioned 
criteria. Relying on the results of the in silico 
screening Based on docking with LUDI and 
a pharmacophore search with CATALYST 
(356)] 600 compounds were tested initially. 


Then, close analogs of the first series of hits 
were assayed, resulting in a total screen of 
3000 compounds. This provided 150 hits, clus¬ 
tered into 14 chemical classes. Seven of these 
classes could be demonstrated as novel DNA 
gyrase inhibitors competingfor the ATP bind¬ 
ing site. Subsequent structure-based optimi¬ 
zation resulted in inhibitors with potencies 
equal to or up to 10 times better than those of 
known antibiotics. 


6 OUTLOOK 

The first docking programs were introduced 
about 20 years ago, and the publication of the 
first generally applicable scoring functions 
dates back about 10 years. Since then, much 
experience has been gained in developing and 
applying docking algorithms, using scoring 
functions, and assessing their accuracy. Sig¬ 
nificant progress has been made over the last 
few years and it appears as if there are now 
docking tools available to address a variety of 
goals with considerable accuracy, from the 
precise and detailed analysis of binding inter¬ 
actions for a small set of ligands up to a fast 
screening of large compound collections. Sim¬ 
ilarly, scoring functions are currently avail¬ 
able that can be applied to a wide range of 
different proteins and consistently yield a con¬ 
siderable retrieval of active compounds. As a 
consequence, the pharmaceutical industry in¬ 
creasingly uses virtual screening to identify 
possible leads. 

In fact, structure-based design is now es¬ 
tablished as an important approach to drug 
discovery complementing HTS (3821, al¬ 
though HTS has a number of serious disad¬ 
vantages. It is expensive (383) and it leads to 
many false positives and a disappointingly 
small number of real leads (384,3851, partic¬ 
ularly if screening is performed on a member 
of a new protein class. Also, not all assays are 
easily amenable to HTS requirements. Fi¬ 
nally, despite the library sizes of several mil¬ 
lion entries availableto the pharmaceutical in¬ 
dustry, these compound collections do not 
approach the size and diversity needed to even 
approximately cover the chemical space of 
drug-like organic molecules. Accordingly, fo¬ 
cused design of novel compounds and com- 
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pound libraries should only gain importance. 
In light of current trends in structural geno¬ 
mics and patenting strategies, one may specu¬ 
late that structure-based de novo design will 
become much more important in the near fu¬ 
ture. 

To meet the increasing demands being 
placed on virtual screening, the development 
of more reliable scoring functions is certainly 
vital for success. In addition, novel or im¬ 
proved docking algorithms are required. We 
conclude by summarizing our perspective on 
major challenges in the further development 
of docking procedures and scoring functions: 

1. The fact that protein-ligand interactions 
occur in aqueous solution is generally ap¬ 
preciated, but not yet adequately ac¬ 
counted for in molecular docking proce¬ 
dures. In particular, the simultaneous 
placement of explicit water molecules upon 
docking, accurate estimates of the water 
versus ligand interaction-energy balance, 
and the fast prediction of protonation 
states in binding pockets await a more sat¬ 
isfactory solution. 

2. The consideration of a sufficient degree of 
protein flexibility needs to become part of 
standard docking approaches. This will re¬ 
quire faster algorithms. In addition, with 
respect to scoring, an often overlooked as¬ 
pect of this problem is that as soon as re¬ 
ceptor flexibility is allowed, protein confor¬ 
mational energy changes need to be 
accounted for appropriately. 

3. Although flexible-ligand docking has al¬ 
ready become standard practice, the error 
rate in predictions of interaction geome¬ 
tries is still significant for more flexible li¬ 
gands. Again, more efficient algorithms 
will be required to sample the conforma¬ 
tion space more thoroughly. 

4. Polar interactions are still not treated ade¬ 
quately. It is striking that, even though the 
role of hydrogen bonds in biology has been 
appreciated for a long time and the nature 
of hydrogen bonds is qualitatively well un¬ 
derstood, their quantitative energetic de¬ 
scription in protein-ligand interactions is 
still unsatisfactory (65). 
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5. All scoring functions are essentially ex¬ 
pressed as simple analytical functions fit¬ 
ted to experimental binding data. The pres¬ 
ently available crystal data on complex 
structures are strongly biased toward pep- 
tidic ligands. Because these data are used 
for the development of scoring functions, 
many overestimate the role of polar inter¬ 
actions. The development of improved 
scoring functions clearly requires access to 
better data, especially for nonpeptidic, low 
molecular weight, drug-like ligands, in¬ 
cluding weakly binding compounds. 

6. Unfavorable interactions and unlikely 
docking modes are not penalized strongly 
enough. Methods for taking such undesired 
features into account are still lacking in 
presently available scoring functions. 

7. So far, fast scoring functions cover only 
part of the whole receptor-ligand binding 
process. A more detailed picture could be 
obtained by taking into account properties 
of the unbound ligand, that is, solvation 
effects and energetic differences between 
the low-energy solution conformations and 
the bound conformation. 
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Table 8.2 Comparative EST Counts for Five Genes Sequenced from Normal Prostate, Stage 
B2 Cancer, Stage C Cancer, and Benign Prostatic Hyperplasia (BPH) cDNA Libraries 


Gene 

Normal 

Prostate 

Total 

Stage B2 Cancer 

Stage C Cancer 


BPH 

All Other 
Tissue 

Tags 

P 

Tags 

P 

Tags 

P 

PSA 

13 

7 

0.7-0.8 

14 

0.6-0.7 

22 

0.8-0.9 

0 

PAP 

4 

1 

0.1-0.2 

34 

>0.999 

9 

0.7-0.8 

1 

HGK 

1 

7 

>0.999 

6 

0.97-0.98 

5 

0.8-0.9 

0 

PS1 

0 

3 

0.993-0.994 

7 

0.997-0.998 

1 

0.4-0.5 

0 

PS2 

0 

2 

0.97-0.98 

7 

0.997-0.998 

0 

0 -< 0.1 

0 

Total clones 

4500 


1400 


3400 


4800 

732,000 


The tag counts are from Ref. 21. The P values are calculated according to Equation 8.1, modified for use with different 
total EST counts from the source libraries. The web URL http://igs-server.enrs-mrs.fr/~audic/egi-bin/winflat.pl was used to 
calculate the probability intervals. AP value nearer to 1 indicates that the differential expression is likely to be significant. 
While prostate specific antigen (PSA) and glandular kallikrein (HGK) have been proposed as prostate cancer markers, both 
PS1 and PS2 are prostate specific. Thus, the down-regulation of PAP in stage B2 cancer is not significant using this test, 
whereas, the test shows its up-regulation in the BPH sample to be more significant. So, for lower changes in copy number, 
where more sensitivity is expected, this test of significance is a valuable tool. 


overall profiles obtained from tag counting ex¬ 
periments could be performed using the tradi¬ 
tional x 2 test. However, this is the wrong ap¬ 
proach for experiments where the significance 
of differences between expression levels (i.e., 
tag counts) of individual genes is to be deter¬ 
mined, for example, in diseased and normal 
tissue states (19). One of the issues in per¬ 
forming tag-sampling experiments is that the 
experiments themselves are usually not repli¬ 
cated. Thus, the dispersion of results cannot 
be used to estimate the SEs associated with 
each expression measurement. This elimi¬ 
nates the possibility of using standard tests of 
variance. Instead the Poisson distribution, 
which includes an implicit estimate of stan¬ 
dard error, approximates random sampling of 
tags very well. Audic and Claverie (20) have 
proposed a significance test (see Equation 8.1) 
in which the sample size plays no part, so long 
as it is the same for both experiments, but only 
depends on the observed tag counts of the 
same gene from diseased,^, and normal, g B , 
states: 


P(gB\g A ) = 


(gA+g B )'- 
g A \g B \ 2^* +1 


( 8 . 1 ) 


The equation has also been extended to cover 
the more practical case of different total num¬ 
bers of tags. Thus, taking some data fromFan- 
non (21) as an example, we can calculate val¬ 


ues for the probability of certain genes ex¬ 
pressed at different levels in normal prostate, 
stage B2 cancer, stage C cancer, and tissue 
from a benign prostatic hyperplasia (BPH) 
sample as shown in Table 8.2. 

The relationship between gene expression, 
mRNA level, and protein expression is com¬ 
plex and not one that can be gleaned from col¬ 
lecting copy number information in this type 
of experiment. Even with careful statistical. 
analysis, such as that described above, the as¬ 
sumption that increases or decreases in copy 
number reflect real biologically significant 
events relies on the confidence with which we 
can compare a library made from one set of 
cells to a library made from a different set of 
cells. Thus, most transcript analysis experi¬ 
ments setting out to be quantitative end up 
simply as target identification exercises. A ma¬ 
jor goal of proteomics is to generate a factory- 
type approach to profiling protein level expres¬ 
sion that more closely reflects the biological 
reality. The EST approach has been turned 
into an industrial scale process but has not 
been able to impact the drug discovery process 
significantly because of the biological limita¬ 
tions described and the lack of sound mathe¬ 
matical modeling of the whole process. 

Expression experiments are measures of 
cell population averages, not the contents of 
individual cells, so it is important to consider 
to what extent all cells in the candidate popu- 
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1 INTRODUCTION 

In January 2001, the biopharmaceutical com¬ 
pany Millennium announced that, as part of a 
multimillion-dollar research collaboration 
with Bayer, an anticancer agent had pro¬ 
ceeded to clinical trials (1, 2). The remarkable 
achievement was not that the collaboration 
between a pharmaceutical company and a 
high technology genomics company had been 
so successful in terms of a product, but that it 
had ostensibly lopped 2 years from the discov¬ 
ery lifecycle in the process. As a result of per¬ 
ceived improvements in research efficiency 
such as this, more effort than ever is being 
placed in development and implementation of 
genomics and screening technology automa¬ 
tion. The substantial volumes of data thus 
generated are used to pan for innovative lead 
compounds for novel therapeutic targets to 
feed ever more voracious development pipe¬ 
lines (3). 

The genomics tools of rapid sequence 
screening, microarray chips, expression anal¬ 
ysis, protein interactions, macromolecular 
structure determination, sequence compari¬ 
son, and a host more are all biological tech¬ 
niques that generate different types of raw 
data. Moreover, the data are produced in far 
greater quantity than has been seen in biology 
before and the sheer speed with which the 
data are produced is unprecedented. The im¬ 
provements in process throughput, which are 
so exciting to the financial markets and so crit¬ 
ical to the alleviation of pain and suffering in 
human populations, are readily achievable be¬ 
cause of the science of information integration 
and knowledge transformation that are the 
hallmarks of bioinformatics. It is not enough 
simply to produce data, even from the most 
leading edge of techniques; we must be able to 
manage it effectively and extract useful infor¬ 
mation that leads to that critical knowledge on 
which realistic drug discovery decisions can be 
made. 

The opening example illustrates how 
genomic technologies, including bioinformat¬ 
ics, are making an impact at the drug discov¬ 
ery level. The purpose of this article is to pro¬ 


vide some background to the tools and 
technologies that are used on a daily basis by 
bioinformatics scientists in an effort to make 
the subject more accessible to readers with a 
non-biological training. This article is not, 
however, a training manual in running pro¬ 
grams nor is it an index to the latest resources 
on the Internet. Both bioinformatics and the 
World Wide Web grow at such a pace that any 
journal article is likely to be out-of-date before 
it goes to press. Nevertheless, the Internet is a 
crucial resource for the practicing bioinfor¬ 
matics researcher. All of the core uses for exe¬ 
cuting projects are available there both in 
terms of databases and computer programs. 
Lists and descriptions would fill many vol¬ 
umes. Certain resources are mentioned where 
necessary to illustrate specific examples. Some 
useful starting points for an exploration of 
bioinformatics on the Internet are shown in 
Table 8.1. An introduction to tools and tech¬ 
niques is available in Ref. 4, while Ref. 5 con¬ 
tains a more technical approach based on un¬ 
derstanding of machine learning. For a more 
mathematical treatment, readers are referred 
to Ref. 6. Reference may be made to the jour¬ 
nal Briefings in Bioinformatics for reasonably, 
accessible descriptions of systems and pro¬ 
cesses. Thejournal Bioinfomatics is aimed at a 
more technical audience. The annual database 
issue of Nucleic Acids Research is the gener¬ 
ally accepted place for publication of bioinfor¬ 
matics databases. Some fundamental papers 
referred to are relatively old for a young sci¬ 
ence. 

Rather than delve into minutiae here, the 
aim has been to present an overview of the 
way bioinformatics is being used in the pro¬ 
cess of drug discovery in 2002. Many impor¬ 
tant and exciting aspects of the academic re¬ 
search that is being carried out are passed over 
in silence or referred to only by a brief com¬ 
ment. Computer technologies change rapidly 
and bioinformatics has always been at the 
forefront in applying new computing para¬ 
digms to biological problems (e.g., use of the 
Internet, object-oriented programming tech¬ 
nologies, neural networks, parallel comput¬ 
ing). Some of the molecular biology or bio- 



2 Bioinformatics in Drug Discovery 


335 


Table 8.1 A Selected List of Key Websites for Further Exploration 
of Online Bioinformatics Resources 


Internet Site 


Brief Description 


http:llwww.ncbi.nM.nih.gc3v- 


http:llwww.ebi.ac.uk 


http:llwww.expasy.ch 


http:llwww.man.ac.uk 


http:l/www.ensembl.org 


http: //www. mips. biochem.mpg. de 


The National Center for Biotechnology Information (NCBI). Located 
at the National Library of Medicine in Bethesda, MD, USA. The 
home cf the GenBank DNA sequence database; PubMed literature 
search engine; sequence search tools (e.g., PSI-BLAST); genomic 
sequence navigation tools. A substantial repository of resources in 
all areas of bioinformatics. 

The European Bioinformatics Institute (EBI). This site is located at 
Hinxton Hall, Cambridge, UK. The home of the EMBL Nucleotide 
Sequence Database; data management tools [including publicly 
accessible version of SRS—the Sequence Retrieval System (7)]; 
protein family databases; microarray tools; etc. An extensive 
repository of resources for bioinformatics. 

The Expert Protein Analysis System. Dedicated to the analysis cf 
protein sequence and structure as well as two-dimensional PAGE. 
Home of the SWISS-PROT protein knowledgebase and TrEMBL 
computer annotated supplement (8). 

The University cf Manchester Bioinformatics Education and 
Research site (UMBER). Useful because it is the home of the 
PRINTS (9,10) resource for protein fingerprint analysis and a 
valuable teaching site for bioinformatics. 

Site developed by the Sanger Centre, Hinxton Hall, Cambridge, UK 
and the EMBL-EBI, presenting tools for browsing and researching 
the human genome sequence (1 l)This is a public access server 
providing data and access at no charge. Commercial sites are also 
available for working on commercially produced human genome 
sequence. 

The Munich Information Center for Protein Sequences (MIPS). 
Provides a different view of several model organism genomes with 
tools for analysis. 


physical aspects of techniques used to 
generate data are outlined from the point of 
view cf a bioinformatician and not that of a 
practicing molecular biologist (although it has 
beat reviewed by one, see the acknowledg¬ 
ments) to give a flavor of the kinds of experi¬ 
ments that are performed. 

2 BIOINFORMATICS IN DRUG 
DISCOVERY 

Bioinformatics in drug discovery has tradi¬ 
tionally been used as a tool for finding new 
drug targets ("target selection"). This tech¬ 
nique cf target discovery is an important con¬ 
tribution that bioinformatics has been able to 
make to the drug discovery process. However, 
bioinformatics alone without the background 
of molecular biological and biophysical exper¬ 
iments is sterile. To understand even a bird’s- 


eye overview such as this, some basic material 
has to be covered to enable comprehension 
both of the data and the manner of its analy¬ 
sis. 

The second use to which bioinformatics has 
been put in the drug discovery process is more 
fundamental. It is concerned with the use of 
techniques in molecular sequence analysis to 
generate relationships between sequences 
that are themselves used to provide funda¬ 
mental structures for databases of drug dis¬ 
covery information. Relationships between 
data elements are important because they 
help to place individual elements in a context 
that can be readily assimilated by the user of 
the system. In many situations, observers ap¬ 
proach data from different points of view and 
bring to bear the richness of differing scien¬ 
tific experiences. Whether we care to admit it 
or not, "biologists" and "chemists" have dif- 
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ferent training and background and thus offer 
a range of opinions on similar pieces of infor¬ 
mation. Even the word "activity" means dif¬ 
ferent things to a chemist who has synthesized 
a group of compounds or a biologist who has 
developed an assay to test the compounds. 
Both aspects are necessary for the discovery of 
new drugs, but they are different viewpoints 
that need to be supported by appropriate rela¬ 
tionship mining in the data. If the bioinfor¬ 
matics job is done well, both views can be ac¬ 
commodated in the data structures and user 
interfaces used by both sets of users. 

Throughout the pharmaceutical industry, 
bioinformatics and chemoinformatics groups 
are working closer together than has been the 
case hitherto. This is a consequence of the re¬ 
alization that managing data effectively re¬ 
quires integration of thinking (about defini¬ 
tions of common attributes of molecules both 
small and large), integration of processes, and 
integration of implementation. The recent 
rise in popularity in bioinformatics of the on¬ 
tology is an example of the application of a 
computer science paradigm to the issue of re¬ 
dundancy in nomenclature in many areas of 
biology. Application across the chemistry-bi¬ 
ology domain interface could well be beneficial 
for drug discovery effectiveness. The ontology 
is simply a means to an end, in this instance, 
that end is improved communication and un¬ 
derstanding of basic concepts within and 
across the boundaries of major scientific disci¬ 
plines. There may, of course, be a variety of 
other means to reach that goal. 

3 WHAT IS BIOINFORMATICS? 

3.1 Definitions 

Concisely, bioinformatics is our ability to or¬ 
ganize biological data. From another perspec¬ 
tive, bioinformatics is our ability to under¬ 
stand how biological information is organized. 
From this understanding should spring an en¬ 
hanced view of the interactions between bio¬ 
logical molecules. This should, in turn, inform 
our search strategies for new small molecules 
that will modulate the behaviour of biological 
molecules to give a beneficial therapeutic ef¬ 
fect. These definitions arise from observation 
of the way diverse skills are brought to bear in 


attempting to answer biological questions. 
They also stress the importance of organizing 
and understanding biological data, rather 
than linking these aspects strongly to specific 
hardware or software implementations. Use of 
computers may be involved in the process but 
the definitions are not limited by the applica¬ 
tion of any particular technology. 

Bioinformatics has also been defined as the 
application of computer technology to solving 
biological problems. This definition, perhaps 
what some would consider to be the canonical 
one, is broad but restricts the scope of the def¬ 
inition to problems to which computer tech¬ 
nology can be applied. 

3.2 Integration of Information 

Bioinformatics has become a byword for inte¬ 
gration; specifically the integration of data 
across different data resources to generate in¬ 
tegrated information resources. Finking data 
and information in this way is fundamental to 
bioinformatics activities and so some discus¬ 
sion of the meaning of data, information, and 
knowledge in the context of bioinformatics for 
drug discovery is provided in Section 6. Inte¬ 
gration is important because it provides con¬ 
text, or at least a background, against which 
computational analyses are performed. In the 
past, for single molecule experiments, tliis 
background was achieved through reading the 
literature. Now that multiple molecule exper¬ 
iments are common, even genome-wide or in¬ 
ter-genome analyses, it is simply not practical 
any longer to rely on the literature in its raw 
form, unless it is part of an integrated knowl¬ 
edge-based approach that provides connec¬ 
tions between disparate pieces of information, 
backed up by experimental evidence from 
which to draw conclusions (12). 

3.3 Bioinformatics and Skills 

The pursuit of bioinformatics involves a num¬ 
ber of different skills. Organizing, storing, re¬ 
trieving, and querying sets of biological data 
are techniques that lie at the heart of the sub¬ 
ject. An ability to analyze the characteristics of 
particular sets of biological data is fundamen¬ 
tal. The translation of those characteristics 
into electronic representations that can be or¬ 
ganized on a large scale is the domain of the 
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bioinformatics software and database devel¬ 
oper. The process of analyzing and under¬ 
standing biological data using the tools avail¬ 
able is the domain of the bioinformatics 
analyst. When new tools are in the course of 
development, substantial interaction between 
the two skill sets is essential. 

In the pharmaceutical environment, both 
developer and analyst skills are necessary. 
This is so even where commercial software is 
in use, because there is no single system avail¬ 
able commercially that provides the level of 
integration between the worlds of bio- and 
chemoinformatics necessary to effectively en¬ 
hance the drug discovery process. Some inter¬ 
facing of different systems is required and the 
warehousing of proprietary data is always an 
issue. 

This broad description of bioinformatics 
and of the two types of bioinformatics scientist 
is quite abstract. It does not detail the charac¬ 
teristics of the data with which the bioinfor¬ 
matics scientist has to work. Neither does it 
detine the set of tools that the developer 
should work with or implement. There was a 
time, in the late 1980s and early 1990s, when 
the type of data was well defined. Molecular 
sequence data, the stream of bases in DNA, 
and the stream of residues at the protein level 
were the main types of data. Programmers de¬ 
veloped code in FORTRAN or C and scripting 
languages were immature. 

Now, as science moves forward into a new 
millennium, additional types of data have be- 
ocme important; for example, protein-protein 
interactions and three-dimensional structure, 
high density gene expression chips, cell imag¬ 
ing, etc. Developers have a wide range of tools 
to call on, including high performance C and 
C++ compilers, rich scripting languages (Perl, 
Python, etc.), and efficient, easily accessible 
operating systems (particularly Linux) that 
mate porting software to different hardware 
platforms less of an issue than it was. 

Of equal importance to the medicinal 
chemist, to whom this review is principally di¬ 
rected, is the impact of bioinformatics on the 
discovery of new medicines. Rather than ex¬ 
plain comprehensively all the popular tools 
and their underlying algorithms, this review 
focuses on the points in the discovery research 
process where bioinformatics is making an im¬ 


pact. Technologies will be described to the ex¬ 
tent that such understanding is necessary to 
grasp the relevance of the data being gener¬ 
ated and its significance. 

3.4 Standardization 

Progress in linking items of relevant data and 
generating integrated information resources 
would be very limited were it not for efforts in 
standardization that have been brought about 
by international collaboration. There is still a 
long way to go, however. While it is becoming 
cheaper to obtain each piece of individual data, 
the proportion of automated experiments is 
increasing, at least in the life sciences, because 
of the ready availability of new technologies. It 
may seem a simple matter to create resources 
that store and manage streams of DNA 
bases—represented by the four alphabetic 
characters A C, T, and G. However, when we 
also wish to integrate information on experi¬ 
mentally or computationally determined 
annotation and cross-reference to other re¬ 
sources using gene names, there are signifi¬ 
cant problems. The literature abounds with 
synonyms for gene names and functions; even 
the labels given to specific cellular functions 
are not always clearly defined (13). 

To be able to process data automatically, it 
has to be presented in a form that can be 
parsed by a computer program and must also 
include all the elements necessary to an un¬ 
derstanding of the biological system under 
study. Reliable information systems should 
have source data of a consistently high quality 
to prevent application errors and enable inte¬ 
gration into other biological information sys¬ 
tems. Some progress is now being made to¬ 
wards consensus in gene naming through the 
work of the HUGO Gene Nomenclature Com¬ 
mittee (see http://www.gene.ucl.ac.uk/nomen- 
clature/). Many researchers now use this sys¬ 
tem as a source of unique gene names and 
descriptions in the published literature (14) 
and in commercial products (e.g., see http:// 
www.biowisdom.com/). Standardizing vocab¬ 
ulary expressing the relationships between 
the complex network of gene functions is the 
work of the Gene Ontology (GO) project (see 
http://www.geneontology.org/). 
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4 BIOINFORMATICS AND TARGET 
DISCOVERY 

The desire to find new drug targets is 
grounded in the need of pharmaceutical com¬ 
panies to address the requirements of differ¬ 
ent disease markets. The literature is full of 
papers detailing sequence determinations of 
newly cloned receptors and enzymes along 
with research on their functional properties. 
The realization that the sequences of genes 
could be acquired relatively cheaply through 
the use of automated sequencing machines us¬ 
ing fluorescent base technology, rather than 
the previous generation of radioactive se¬ 
quencing gels, meant that sequence data be¬ 
gan to flood the public DNA sequence data¬ 
bases. The growth of these databases is 
reviewed in Fig. 8.1. 

Because the translation of DNA coding se¬ 
quence to protein sequence is straightforward, 
given the understanding of the genetic code, it 
is a trivial task to implement software to pro¬ 
vide translations of open reading frames 
(ORFs 1 ) to be housed in the annotation sec¬ 
tions of the DNA databases. Consequently, 
protein sequence databases have become 
swamped with hypothetical proteins — those 
proteins assumed to exist because open read¬ 
ing frames have been discovered and from 
which hypothetical protein sequences had 
been computed. The function of these se¬ 
quences has been assigned through a compar¬ 
ison of their amino acid residue similarity with 
that of known sequences (for example, those 
that have had their biochemical function dem¬ 
onstrated through heterologous expression). 
In this way, very large numbers of sequences 
have been processed into the databases and 
annotated using such sequence comparison 
techniques. 

When it comes to the practical details of 
how bioinformatics can speed the process of 
drug discovery, it is reasonable to ask what 
sorts of data could be valuable in that process. 
The stages of the drug discovery process 
where bioinformatics makes an impact are 
target identification, assay selectivity panel 
selection, and integration throughout the as- 

* 

ORFs are contiguous strings of residues, uninter¬ 
rupted by the genetic code's "stop" signal. 


say development and screening process. Tar¬ 
get identification makes use of sequence data 
for functional assignment by inference from 
similarity with known sequences (Fig. 8.2). It 
also benefits from assessments of differential 
levels of expression in different cellular con¬ 
texts and at various stages in the expression 
process (transcriptome or proteome, see Fig. 
8.3 for definitions). Selectivity panel selection 
relies on a thorough mining of the related gene 
family and may benefit from phylogenomic 
analysis (see Section 5.3). Use of bioinformat¬ 
ics for integration of data capitalizes on the 
generation of relationship information be¬ 
tween known genes and the ability to use hy¬ 
perlinking to create navigational tools and us¬ 
able interfaces. 

A further development, the production of 
millions of short expressed sequence tags 
(ESTs), encouraged the focus on target discov¬ 
ery during the 1990s. The development of EST 
technology itself spawned the genesis of sev¬ 
eral new genomics companies, including Hu¬ 
man Genome Sciences and Incyte, which have 
worked in collaboration with the pharmaceu¬ 
tical industry to hunt for new disease related 
targets. 

4.1 Functional Genomics and Target 
Discovery 

* 

The collation of gene sequence data, from 
whatever source, is in effect simply a matter of 
transferring data from one place to another; 
for example, from a sequence chromatogram 
to a computer database. Learning the se¬ 
quence of a genome, or any of its constituent 
parts, is a long way from understanding its 
biological function. The sequencing of ge¬ 
nomes has resulted in a technical genome de¬ 
scription (at a particular level of detail) 
through the process of cataloguing an organ¬ 
ism's genes. This level of detail is often called 
the physical map of the genome. There are 
other ways of mapping the genome that pro¬ 
vide different levels (or we may think of it as 
resolution) of genomic detail; genetic maps in¬ 
dicating the location of genes for specific traits 
have been known for some time, while single 
nucleotide polymorphism (SNP) maps can be 
used to highlight the positions of differences 
between populations through study of genetic 
polymorphisms (15).Indeed, the identification 
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Figure 8.1. Bar charts indicating the growth of GenBank from December 1992 to August 2001 in 
terms of (a)bases and (b) sequence entries. The release files indicate no release in February 1999. It 
is evident from the trends in both charts that while there has been explosive growth, particularly 
from December 1999 until about August 2000, growth is slowing. The base entry curve is showing a 
distinctly sigmoid shape. 
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Figure 8.2. A schematic illustrating the bioinformatics process required to create an online gene 
index by collating data and then integrating related elements to generate value added information 
through hyperlinking to online resources. Determination cf phylogenetic relationships is a relatively 
late stage in the process. EST analysis is only performed after phylogenetic relationships have been 
determined because EST data does not cover the whole expressed sequence and may not therefore 
cover regions that were included in the phylogenetic determination. This is a knowledge generation 
phase because it is allowing placement of potential new targets within the context of a carefully 
researched phylogenetic tree. Transfer of knowledge is intimately related to the environment in 
which the results of analysis are made available, in this case as an online resource. 


of genes themselves from genomic sequence is 
itself a non-trivial matter, especially where 
those genes are interrupted by non-coding re¬ 
gions (introns) and control regions (expres¬ 
sion promoter sites). Functional genomics is 
the process of creating an understanding of 
the way genomes function through gene ex¬ 
pression. Genes are expressed by a variety of 

mechanisms, not all of which are fully under¬ 
stood. We can, however, make some measure¬ 
ments of the results of gene expression at the 
transcript level, mRNA, and at the protein 

level. Several of the techniques that have been 
used to assist drug target discovery are pre¬ 
sented in the following sections. 

4.2 Expression Profiling for Target Discovery 

Bioinformatics spans analysis in depth on 
small quantities of data through to expansive 
genomic scale analyses, which may be at a 
lesser level of detail. Historically, the expres¬ 
sion of genes at the mRNA level or at the pro¬ 


tein level has been a crucial tool for assessing 
the significance of specific classes of cells as 
targets. With the advent of fluorescence-based 
sequencing techniques and automated se¬ 
quencing technology, it is now much quicker 
to generate sequence data on specific molecu¬ 
lar targets than ever. Many researchers spend 
entire careers working on one target type or a 

restricted part of a target gene family. This 
approach has yielded many valuable targets 

for drug discovery. With the new technologies 
of molecular biology, it is now possible to sur¬ 
vey, targets in a variety of contexts: perhaps 
within different types oi cells, cells treated 

with different agents, or even across entire ge¬ 
nomes using chip technologies. 

There are issues of interpretation of exper¬ 
imental design and results. Does mRNA ex¬ 
pression mean anything at a quantitative 
level? Perhaps even a qualitative view of 
mRNA expression can be misleading. How is 
mRNA expression correlated with protein ex- 
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Figure 8.3. A schematic illustration of the relationships between levels of genomic information. 
Genomic DNA is contained in the nucleus of eukaryotic cells. In many species, including humans, 
information required to make up the coding sequence of a gene is split into exons (regions that are 
expressed) interrupted by introns (regions that are not expressed and are edited out of the message 
at the transcription step). At either end of the gene sequence are untranslated regions (UTRs). 5' and 
3' refer to the orientation of the strand of DNA as defined by the sugar-phosphate backbone. The 
mRNA is the messenger RNA molecule generated by the process of transcription, which is itself 
mediated by a number of enzymes. The collection of mRNA transcripts that make up the mRNA 
expression profile of a cell is known as the transcriptome, although the term could also refer to the 
total possible mRNA transcripts achievable from a genome. Finally, translation of the mRNA occurs 
on the ribosome and protein sequence is produced, which folds into its final three-dimensional 
shape—a process that may be assisted by a number of different chaperone proteins. Any post- 
translational modifications are all part of the proteome, the collection of proteins that represent the 
expressed genome. 


pression? In general, most drugs we discover 
are likely to interact with proteins and not 
mRNA, so some understanding of protein ex¬ 
pressions an essential adjunct to our genomic 
knowledge. Hence the proteomics approaches 
described later. Our exploration of expression 
profiling begins with a study of mRNA tran¬ 
script profiling using expressed sequence tags 
because this technique has led to rapid gene 
discovery that has, in turn, been able to assist 
with the annotation of genomic sequences. 
Then, we consider how whole genome expres¬ 
sion profiles can provide a rich new source of 
data for bioinformatics analysis. 

4.2.1 EST Profiling. An EST is a short, sin¬ 
gle sequence run collecting data over about 
200-400 bases from a clone selected from a 
cDNA library. Typically, cDNA clone libraries 


contain 1-3 million clones. The library itself is 
created from mRNA extracted from tissue or 
refined cell populations. By making a random 
selection of several thousand clones from a 
cDNA library it is possible by sequencing 
ESTs to generate a rapid, if somewhat low res¬ 
olution, survey of the types of genes repre¬ 
sented by the library. The library in turn re¬ 
flects the composition of genes that are 
expressed in the tissue or cell line from which 
it was constructed. Thus, we have a qualita¬ 
tive link between gene expression, at the 
mRNA level, and the sequence level analysis 
required for target identification, without the 
need to go through the full sequencing and 
validation process across the whole length of 
each clone. This is a very significant time and 
cost saving. One of the major issues of EST 
profiling has been the significance that can be 
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ascribed to expression levels through counting 
copies of ESTs. This issue is dealt with in some 
detail in Section 4.2.4. 

4.2.2 Sequence Assembly for cDNA Clon¬ 
ing. To appreciate fully the speed advantage 
of sequencing tags, rather than fully validat¬ 
ing the sequence of an entire clone of a gene, it 
is useful to step through a brief description of 
the cloning process from the point of view of a 
practitioner of bioinformatics. 

Sequence assembly is the process of dealing 
with the bioinformatics of cloning and genomic 
sequencing (16,17). When a gene is cloned, it 
is selected from a set of potential clones in a 
cDNA library. The gene is present as a piece of 
cDNA inserted into a cloning vector (a piece of 
circular DNA) that has been designed for the 
purpose of cloning. It is necessary to check 
that the cDNA indeed represents the sequence 
of the gene that has been cloned. To do this, 
DNA oligonucleotides are designed that will 
bind in a complementary fashion (hybridize) 
to the DNA of the cloning vector and also at 
150- to 200-base intervals along the cDNA it¬ 
self. These oligonucleotides are then extended 
by adding a base that is complementary to the 
cDNA insert by using a DNA polymerase. Dif¬ 
ferent polymerasesare available commercially 
that provide high fidelity reproduction of the 
cDNA insert. In fluorescence-based sequenc¬ 
ing, a small proportion of the nucleotides 
available to the polymeraseare fluorescent an¬ 
alogues. Incorporation of one of these into the 
oligonucleotide terminates the extension, re¬ 
sulting in a population of oligonucleotides of 
different lengths. These are separated by elec¬ 
trophoresis and the sequence determined. 

When the sequence of each oligonucleotide 
has been determined, the strings of letters 
that represent the bases are assembled to¬ 
gether to generate a full-length sequence of 
the cDNA that has actually been cloned. Er¬ 
rors in the base sequence can be resolved at 
this stage, and if necessary, mutagenesis ex¬ 
periments designed to correct any mistakes. 
The bioinformatics process is intimately 
linked with the molecular biology techniques 
of cloning and sequencing. For target discov¬ 
ery, a very high degree of confidence in the 
sequence of the cDNA clone is required before 
the clone can be expressed and used in assay 


development. It can also be seen that this pro¬ 
cess is much lengthier than taking a single 
sequence read (a single oligonucleotide string) 
without correcting errors or considering cov¬ 
erage of the complete gene sequence. 

4.2.3 Comparing ESTs with Databases. 

Bioinformatics provides the tools necessary to 
compare each EST with the databases of 
known genes and a hypothetical functional as¬ 
signment may be made to a proportion (typi¬ 
cally 40-50%)of all the ESTs from a sequenc¬ 
ing run. 

In this way a rich resource of tags for many 
clones from many diverse libraries has been 
built up in the public domain and in commer¬ 
cially available, proprietary databases. One 
particular approach that generated much in¬ 
terest in the 1990s was that advocated by In- 
cyte. Here, the simple identification of a gene 
expressed through identification of its EST 
was not the primary goal. Instead, the ap¬ 
proach was based on comparative transcript 
expression (the so-called "digital Northern"). 
Here the number of copies of each EST iden¬ 
tified was calculated, giving counts for the 
numbers of each type of EST found in compar¬ 
ing normal with diseased tissue, for example. 
Subsequent techniques have focussed on more 
controlled experiments in which specific cell 
lines are treated with an agent and the expres¬ 
sion of genes before administration is com¬ 
pared with the profile afterward. This is the 
basis of pharmacogenomics (18). 

4.2.4 Statistics for Assessing Expression 
Level Significance. There are issues with ap¬ 
proaches based on counting the number of 
copies of an EST observed in the output from a 
sequencing machine. First, the tissue or cell 
line must be of very high quality and the 
mRNA harvested in a timely manner because 
it degrades very quickly. Second, the process 
of preparing the cDNA library should enable 
the numbers of clones to be estimated as accu¬ 
rately as possible. Third, the random sampling 
for the sequencing runs must be controlled 
carefully so as not to introduce bias into the 
experiment. The mathematical model for eval¬ 
uating the meaning of data from such experi¬ 
ments is not well worked out. 

Comparison of the differences between the 
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Table 8.2 Comparative EST Counts for Five Genes Sequenced from Normal Prostate, Stage 
B2 Cancer, Stage C Cancer, and Benign Prostatic Hyperplasia (BPH) cDNA Libraries 


Gene 

Normal 

Prostate 

Total 

Stage B2 Cancer 

Stage C Cancer 


BPH 

All Other 
Tissue 

Tags 

P 

Tags 

P 

Tags 

P 

PSA 

13 

7 

0.7-0.8 

14 

0.6-0.7 

22 

0.8-0.9 

0 

PAP 

4 

1 

0.1-0.2 

34 

>0.999 

9 

0.7-0.8 

1 

HGK 

1 

7 

>0.999 

6 

0.97-0.98 

5 

0.8-0.9 

0 

PS1 

0 

3 

0.993-0.994 

7 

0.997-0.998 

1 

0.4-0.5 

0 

PS2 

0 

2 

0.97-0.98 

7 

0.997-0.998 

0 

0 -< 0.1 

0 

Total clones 

4500 


1400 


3400 


4800 

732,000 


The tag counts are from Ref. 21. The P values are calculated according to Equation 8.1, modified for use with different 
total EST counts from the source libraries. The web URL http://igs-server.enrs-mrs,fr/~audic/egi-bin/winflat.pl was used to 
calculate the probability intervals. AP value nearer to 1 indicates that the differential expression is likely to be significant. 
While prostate specific antigen (PSA) and glandular kallikrein (HGK) have been proposed as prostate cancer markers, both 
PS1 and PS2 are prostate specific. Thus, the down-regulation of PAP in stage B2 cancer is not significant using this test, 
whereas, the test shows its up-regulation in the BPH sample to be more significant. So, for lower changes in copy number, 
where more sensitivity is expected, this test of significance is a valuable tool. 


overall profiles obtained from tag counting ex¬ 
periments could be performed using the tradi¬ 
tional y 2 test. However, this is the wrong ap¬ 
proach for experiments where the significance 
of differences between expression levels (i.e., 
tag counts) of individual genes is to be deter¬ 
mined, for example, in diseased and normal 
tissue states (19). One of the issues in per¬ 
forming tag-sampling experiments is that the 
experiments themselves are usually not repli¬ 
cated. Thus, the dispersion of results cannot 
be used to estimate the SEs associated with 
each expression measurement. This elimi¬ 
nates the possibility of using standard tests of 
variance. Instead the Poisson distribution, 
which includes an implicit estimate of stan¬ 
dard error, approximates random sampling of 
tags very well. Audic and Claverie (20) have 
proposed a significance test (see Equation 8.1) 
in which the sample size plays no part, so long 
as it is the same for both experiments, but only 
depends on the observed tag counts of the 
same gene from diseased,^, and normal, g B , 
states: 


P(gB\g A ) = 


(gA+g B )'- 

g A \g B \2^ + e° +1 


( 8 . 1 ) 


The equation has also been extended to cover 
the more practical case of different total num¬ 
bers of tags. Thus, taking some data fromFan- 
non (21) as an example, we can calculate val¬ 


ues for the probability of certain genes ex¬ 
pressed at different levels in normal prostate, 
stage B2 cancer, stage C cancer, and tissue 
from a benign prostatic hyperplasia (BPH) 
sample as shown in Table 8.2. 

The relationship between gene expression, 
mRNA level, and protein expression is com¬ 
plex and not one that can be gleaned from col¬ 
lecting copy number information in this type 
of experiment. Even with careful statistical. 
analysis, such as that described above, the as¬ 
sumption that increases or decreases in copy 
number reflect real biologically significant 
events relies on the confidence with which we 
can compare a library made from one set of 
cells to a library made from a different set of 
cells. Thus, most transcript analysis experi¬ 
ments setting out to be quantitative end up 
simply as target identification exercises. A ma¬ 
jor goal of proteomicsis to generate a factory- 
type approach to profiling protein level expres¬ 
sion that more closely reflects the biological 
reality. The EST approach has been turned 
into an industrial scale process but has not 
been able to impact the drug discovery process 
significantly because of the biological limita¬ 
tions described and the lack of sound mathe¬ 
matical modeling of the whole process. 

Expression experiments are measures of 
cell population averages, not the contents of 
individual cells, so it is important to consider 
to what extent all cells in the candidate popu- 
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Table 8.3 Brief Descriptions of Three Technologies for Genomic Scale Transcript Profiling 


Expression 

Profiling 

Technology Brief Description Form of Data Generated 


cDNA array chip 


High-density 

oligonucleotide 

arrays 


Serial analysis 
cf gene 
expression 


Tens of thousands of cDNA clones cf genes are placed 
onto a glass slide in a grid formation. Hybridisation 
of molecular probes (RNA extracts) to the clones is 
detected using a fluorescence system. By using two 
sets of probes, labelled with differently coloured 
fluorescent dyes, it is possible to assess expression 
differences. 

Arrays of oligonucleotides are synthesised directly 
onto the glass chip using special chemistries and 
light sensitive masking. This generates arrays of 
known sequences of fixed length. Probes are 
hybridised to the arrays and computational 
analysis is necessary to interpret the resulting 
patterns. 

A sequence-based approach to the identification of 
differentially expressed genes through comparative 
analysis. Allows simultaneous analysis of sequences 
that derive from different cell populations or 
tissues. This is not a chip-based method. 
Identification of sequences relies on completeness 
of public sequence databases and, therefore, can 
only be used to analyse known genes. 


Fluorescence intensities and 
colours for each spot on 
the chip. The nature cf 
the clones on the chip is 
known. 


An image of the entire chip 
is processed using 
specialised chip scanning 
software. 


Sequence data for SAGE 
tags allows profiling of 
gene expression. 


lation are in the same state (22). Whereas 
work in single-celled organisms may be more 
straightforward to control, work in multi-cel¬ 
lular organisms has the added complexity that 
expression measurements may involve contri¬ 
butions from cells derived from a variety of 
tissues. Furthermore, when taking into con¬ 
sideration mRNA copy number, it should be 
understood that absolute transcript abun¬ 
dance measurements do not completely mea¬ 
sure mRNA concentration. 

Although there was initially some concern 
that the use of ESTs was a shortcut to discov¬ 
ery of genes for the purposes of patenting and 
ring-fencing areas of research for profit, in 
fact, the substantial numbers of quality ESTs 
in the public domain have helped in the pin¬ 
pointing of genes in genomic data and have 
contributed to the speed with which the hu¬ 
man genome sequence was completed. 

4.2.5 Genome-Wide Expression Analysis. A 

major step towards understanding how organ¬ 
isms work is the determination of the com¬ 
plete sequence of all genes in the genome. This 
remarkable goal has been achieved for a num¬ 


ber of organisms, including Homo sapiens, the 
flowering plant Arabidopsis thaliana, the sin¬ 
gle celled yeast Saccharomyces cerevisiae, and 
a large number of bacteria. The analysis of the 
sequence data then becomes the issue. It is no 
trivial task even to locate the positions of all 
the genes in the human genome. Genes for 
which there are no homologs in the current 
sequence databases will take some time to elu¬ 
cidate. See Ref. 23 for a detailed analysis of 
this topic and then Refs. 24 and 25 for detailed 
studies on the human genomic sequence. 

The three basic technologiesfor generation 
of genome-wide expression information are 
cDNA microarrays, high-density oligonucleo¬ 
tide arrays (“GeneChips”), and serial analysis 
of gene expression (SAGE) (22). These tech¬ 
nologies are outlined in Table 8.3. 

In terms of quantities of data, a single mi¬ 
croarray experiment looking at 40,000 genes 
from 10 different samples, under 20 different 
conditions, produces at least 8,000,000 pieces 
of data (26). Chip technologies, though origi¬ 
nally expensive because of the costs of chip 
fabrication, are now being used to contribute 
data to public domain databases and are 
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widely used in industrial applications. A re¬ 
cent comparison of array databases available 
for local installation, public submission of data 
cr public query, listed 13 different systems 
from sources worldwide (27). One such repos¬ 
itory, ArrayExpress, is now being funded by 
the European Union at the EBI and is in the 
early stages of development (see http://www. 
ebi.ac.uk/arrayexpress). It is intended to be 
compliant with the microarray gene expres¬ 
sion database (MGED) standard (see http:// 
www.mged.org/). 

The process of gene expression by microar¬ 
ray is shown below, based on (13). 

1. Construct array 

2. Prepare biological samples for investiga¬ 
tion 

3. Extract and label sample RNA 

4. Hybridize samples to array 

5. Image the array 

6. Locate spots and evaluate fluorescent in¬ 
tensities 

7. Construct gene expression matrix from 
spot intensities 

8. Analyze gene expression matrix 

As with any biological experiment, the re¬ 
sult of this process should be the accumulation 
of knowledge concerning the biological pro¬ 
cesses under study. Interestingly, the first five 
steps are material handling processes, while 
the remaining steps only involve information 
processing. 

5 DATABASES, TOOLS, AND 
APPLICATIONS 

Because bioinformatics is about the manage¬ 
ment of information in the domain of biology, 
databases play a significant part in acting as 
repositories for a wide variety of different 
types of data. The main focus of this section is 
to give a flavor of the breadth of databases 
available and to highlight the role of the pri¬ 
mary sequence databases and the secondary 
pattern (orfamily) databases in assisting with 
protein functional assignment. 


5.1 Databases 

Data repositories of DNA sequence, protein 
sequence, and higher-level resources of inte¬ 
grated information pertaining to the relation¬ 
ships between sequences (for example, pat¬ 
tern and profile databases) are the core tools 
for performing a wide variety of bioinformat¬ 
ics analyses. Many of these databases are in 
the public domain and are freely accessible 
through links available at a range of websites 
(including those listed in Table 8.1), although 
copyright is claimed in the annotation sections 
of some databases. The January 2002 issue of 
Nucleic Acids Research is a special annual da¬ 
tabase issue. It contains 112 articles describ¬ 
ing in some detail different databases in use in 
the field. These are a subset of the 339 data¬ 
bases (up from 281 in the previous year) listed 
and briefly described in the Molecular Biology 
Database Collection, which constitutes an ad¬ 
ditional article in itself (28, 29). The complete 
list can also be found at http://www.nar. 
oupjoumals.org. While the list was being pre¬ 
pared for the 2001 edition, 55 new databases 
were added to the previous total. In 2002, 58 
additions were made. This rapid expansion in 
the number of databases available is indica¬ 
tive of the recognition by the community of 
the need for accessible, carefully designed da¬ 
tabases to meet the needs of a wide diversity of 
research programs. 

Much of the value of databases, assuming 
the provision of accurate sequence data, arises 
from the quality of the annotation that is 
available. This normally includes at least a 
brief description of the function of the se¬ 
quence and essential references to the litera¬ 
ture. Many databases include a lot more than 
this. In particular, SWISS-PROT (8)is viewed 
as the most reliable source for annotation in¬ 
formation. SWISS-PROT emerged in the 
1980s out of a need to have high quality, ro¬ 
bust annotations for the protein sequences 
that made up its core content. However, the 
process of annotation is labor intensive and 
not one that is easily automated. Although the 
content of SWISS-PROT is well regarded, it 
lacks the completeness of the source DNA da¬ 
tabases because of the necessary delay in in¬ 
corporating newly annotated sequences. In¬ 
deed, a team of annotators is employed at the 
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EBI solely to perform this task. A computa¬ 
tionally annotated supplement, TrEMBL (8), 
has been made available to make up for this 
deficiency. Nevertheless, computer annota¬ 
tion still has some way to go before it comes 
close to the level of competence of skilled hu¬ 
man annotators. This is an area of active re¬ 
search (30). 

Nevertheless, with the rapid generation of 
sequence data from genome scale experiments 
more effective means of characterizing pro¬ 
tein sequences and annotation are now re¬ 
quired. The database has responded by im¬ 
proving labeling of annotation in both SWISS- 
PROT and TrEMBL and by adding more 
advanced and rigorous tagging of evidence for 
functional statements that have been made 
(31). 

Whereas most patent sequences are avail¬ 
able in the public domain for use in research 
and for commercial exdoitation, there is a 
substantial body that are the subject of patent 
protection. It is often useful when conducting 
searches of sequence databases to be aware of 
the sequences that are patented because this 
may imply certain restrictions on the use to 
which these sequences can be put in a com¬ 
mercial context. The commercial repository is 
maintained by Derwent (Thomson Scientific), 
which generates the Geneseq database of pat¬ 
ented sequences. This is a useful collection be¬ 
cause it contains a broad historical collection 
as well as more recent examples, although the 
terms for a commercial license to use the da¬ 
tabase may be off-putting to some potential 
users. There are also patent sections of Gen- 
Bank/EMBL DNA databases too. but these are 
of limited value because they contain only 
more recent sequence data. 

5.2 Sequence Comparison 

When dealing with the output of most experi¬ 
ments in target discovery the question "has 
this gene been seen before?" arises. The an¬ 
swer is, at first sight, straightforward: Com¬ 
pare the sequence obtained from the experi¬ 
mental output with all the known sequences 
and print the result. 

Sequence comparison makes up a major 
part of the work of the bioinformatics analyst. 
It demands skill in operating the tools; for ex¬ 
ample, choosing the appropriate databases to 


search and selecting the appropriate search 
method, followed by insight and experience in 
assessing the meaning of the results of the 
search. A search query with a single previ¬ 
ously known sequence is likely to return not 
only the match with itself but also a host of 
other matches at varying levels of similarity 
with the query sequence. This extra informa¬ 
tion can be very valuable in placing the query 
sequence in the context of many closely re¬ 
lated sequences that make up the family of 
genes to which the query belongs. More dis¬ 
tantly related sequence matches can poten¬ 
tially indicate genes with similar function, 
even if the match is relatively short and of low 
score. 

The experienced analyst should be able to 
sort the significant matches from the uninter¬ 
esting ones. Often, this type of experience is 
difficult, if not impossible with current tech¬ 
nologies, to capture in a computer program. 
Rules that seem to work under some circum¬ 
stances produce nonsensical results in others. 
As a result, many of the techniques used for 
current sequence comparison engines are heu¬ 
ristic rather than strictly algorithmic, that is, 
the rules that are implemented as part of the 
process for returning significant hits from the 
query database tend to produce the correct re¬ 
sult but cannot be guaranteed to do so in all 
circumstances. For a fuller discussion of algo¬ 
rithms and heuristics, albeit outside the con¬ 
text of bioinformatics, see Ref. 32. 

One of the key aspects of sequence compar¬ 
ison is the understanding of similarity when 
applied to molecular sequences. There are es¬ 
sentially two ways of considering this: simple 
residue identity and residue substitution. In 
this discussion, we consider the comparison of 
two protein sequences, but the process is the 
same for comparison of DNA or RNA se¬ 
quences. The alphabet used in the comparison 
is just different because it is 20 for protein 
sequences and 4 for DNA and RNA. By com¬ 
paring residues at the same position in each 
sequence and counting up the number of iden¬ 
tities we arrive at, a score that can be ex¬ 
pressed as a percentage match for the pair of 
sequences. The alternative method compares 
each pair of residues and looks up a score for 
that pair in a substitution table or scoring ma¬ 
trix. The summed score across the whole se- 
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quence length can again be expressed as a per¬ 
centage match. The two sequences under 
comparison are, however, likely to be suffi¬ 
ciently different that equivalent residue posi¬ 
tions are not in register when the two se¬ 
quences are laid out, one on top of the other. In 
this situation, the sequences must be aligned 
with each other so that equivalent residue po¬ 
sitions are in register to make the score mean¬ 
ingful. This may involve insertion of gaps into 
cne or both sequences. The skill here is to cre¬ 
ate an alignment between the two sequences 
that reflects some biological reality; it is from 
this biological reality that we derive the notion 
of equivalent residue positions. These posi¬ 
tions can be deduced from manual manipula¬ 
tion of the alignment on the basis of mutation 
data or other functional information using a 
suitable sequence editor (33), or perhaps from 
understanding the spatial layout of residues if 
structural data is available. In each of these 
cases, the resulting sequence alignment will 
reflect the manner in which equivalent resi¬ 
due positions have been determined — both 
methods have their place. A variety of meth¬ 
ods have been developed for comparing pairs 
of sequences, including the basic classical 
methods of Needleman and Wunsch (34) and 
Smith and Waterman (35). 

Extending these pairwise comparison 
methods to database searching has been car¬ 
ried out, and a plethora of hybrid methods and 
improvements have been made. The manner 
in which significant alignments are reported 
varies from implementation to implementa¬ 
tion. Database searching by alignment in this 
^\uy is computationally intensive and special¬ 
ized computer hardware is often used to gain 
speed increases. Because the comparison of 
pairs of sequence takes place in an exhaustive 
manner, these types of database searching 
methods are considered to be the most sensi¬ 
tive. More modern methods of database 
searching look for shorter matches spread 
over the lengths of the query and database 
sequences, and then extend these matches un¬ 
til the score for the match falls below a thresh- 
dd level. Lists of sequence matches returned 
are then aligned using a pairwise alignment 
technique to provide a match and score over 
the whole length of the comparison sequences. 
Fcr an example of this type of approach, see 


FASTA (36). Such methods are readily imple¬ 
mented on standard computer hardware and 
thus are accessible as Internet resources or as 
local implementations on UNIX or Linux serv¬ 
ers. 

The most popular tool currently in use is 
BLAST (Basic Local Alignment Search Tool) 
(37) from the NCBI. BLAST is an example of a 
heuristic that attempts to optimize a specific 
similarity measure. The most recent revisions 
to the algorithm are gapped BLAST and PSI- 
BLAST (38), with improved accuracy for PSI- 
BLAST using composition-based statistics 

(39) . 

5.3 Phylogenomics and Gene Family 
Databases 

Determining protein function from genomic 
sequences is a central goal of bioinformatics 

(40) , and to achieve this goal, comparing single 
sequences against databases of DNA or pro¬ 
tein sequences is a necessary bioinformatics 
skill. However, many such searches have al¬ 
ready been carried out, and the results are 
available to analyze at a higher level of ab¬ 
straction in the protein and gene family data¬ 
bases (9,10,14, 41-43). It is the relationships 
between sequences that form the basis of any 
gene family database. Many of the current da¬ 
tabases did not set out to become gene family 
databases. However, application of the under¬ 
lying methodology for defining gene families 
(whether based on blocks of conserved se¬ 
quence alignment or on profiles representing 
entire sequences, or simple regular expres¬ 
sions) has resulted in a number of resources 
that are particularly valuable in placing drug 
discovery targets in their biological context. 

The processes of evolution by natural selec¬ 
tion imply that species are related to each 
other in a tree-structured hierarchy; but more 
than this, the history of sequence relation¬ 
ships during evolution is also significant. Or¬ 
ganisms are defined by their genes, and their 
behavior is modified through environmental 
experience. The relationships between genes 
within a single organism indicate that genes 
and their protein products also fall into well- 
defined families. Protein phylogenetic profil¬ 
ing (40) and phylogenomic analysis (44) are 
methods that are valuable where functional 
assignment by sequence similarity alone is 
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Figure 8.4. An example of a 
user interface to a phylogenomic- 
oriented database (48). Relative 
distances, following black line 
paths, between nodes on the tree 
of phosphodiesterases indicate 
the similarity level between 
members of the family, based on 
the regions of the sequences se¬ 
lected for the phylogenetic analy¬ 
sis. Links to aligned domains per¬ 
mit the alignments themselves to 
be explored. The order in which 
the genes appear in the tree (the 
branching order) gives an indica¬ 
tion of the homology relationship 
between members of the family. 
See Section 5.3. 
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problematic. This is because phylogenomic 
analysis is based on understanding the process 
by which sequences have diverged from com¬ 
mon ancestors rather than focusing on the se¬ 
quence similarity itself, which is an evolution¬ 
ary endpoint. The approach is to determine 
the phylogenetic tree of a gene family and then 
to overlay any known functions of the genes on 
the tree. The functions of uncharacterized 
genes are predicted by their phylogenetic po¬ 
sitions relative to those of the previously char¬ 
acterized genes. Importantly, depending on 
the manner of their construction, the trees 
may indicate similarity distances along con¬ 
necting branches but it is the order of branch¬ 
ing that reflects evolutionary relatedness 
(otherwise known as homology). For an inter¬ 
esting discussion of the correct use of the 
terms homology and similarity see Ref. 45. 

This approach is illustrated in the phospho¬ 
diesterase gene family tree presented in Fig. 
8.4. The set of relationships has been deter¬ 
mined by comparing not just two sequences 
with each other, or a database of sequences to 
one sequence (as in a sequence database 
search), but by comparing a set of phosphodi¬ 
esterase sequences to each other in the form of 


a multiple sequence alignment (see Fig. 8.5). 
Conserved regions of un-gapped sequence 
were chosen from this alignment to use as in¬ 
put to a phylogenetic analysis method (46, 47) 
and an evolutionary tree was eventually re¬ 
constructed. This tree represents aview of the 
relationships between genes in the phosphodi¬ 
esterase gene family: more closely related 
genes are closer together in the diagram (i.e., 
they are connected by shorter paths); those 
further away are less closely related to each 
other. Figure 8.4 is based on an entry in Tar¬ 
getBASE (48), which adopts the phylogenomic 
paradigm. The relationships between mem¬ 
bers of gene families are used as the structure 
for an object-oriented database and associated 
user interface that provides a navigation tool 
for a curated gene index. 

There are other approaches to family data¬ 
bases that rely more extensively on sequence 
similarity to define classes of genes or pro¬ 
teins. For example, PROSITE (49) is a re¬ 
source that uses regular expressions to define 
patterns of residues that represent biologi¬ 
cally significant sequence motifs. Recent ver¬ 
sions have incorporated profiles, weight ma¬ 
trices that express the characteristics of a gene 
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Figure 8.5. Part of an alignment cf catalytic domains of the human phosphodiesterase gene family. 
Positions in the alignment where gaps have been introduced into a sequence to bring it into align¬ 
ment with other sequences are indicated by characters. 


family using all the sequence information 
available. The principal value of this resource 
is that it presents patterns for recognition of 
gene families that are relatively simple to un¬ 
derstand. The downside is that the use of such 
patterns can produce both true positive hits 
(members correctly predicted) and false posi¬ 
tive hits (members incorrectly predicted). 
PROSITE lists true and false positives for 
searches performed in the production of a re¬ 
lease of the database, but it is as well to be 
aware that when the patterns are used in iso¬ 
lation, there is often a false positive hit rate 
that must be taken into account by reconciling 
the results of a pattern search with the results 
cf database annotations or other pattern rec¬ 
ognition methods. 

The PRINTS system (9,10, 50, 51) is an 
approach based on an examination of core re¬ 
gions of un-gapped sequence conservation 
within a set of aligned sequences (multiple se¬ 
quence alignment). The method rigorously 
builds up fingerprints for a gene family 
through use of an iterative database searching 
technique allied to intelligently applied se¬ 
quence alignment. The fingerprints them¬ 
selves can then be used to diagnose new gene 
family members in novel sequence data or can 


be used to identify modules of functional se¬ 
quence across different gene families. 

One of the issues in using different data¬ 
bases of gene family information is that defi¬ 
nitions of which genes belong to which gene 
families can vary depending on the method 
used. Apweiler et al. have undertaken a useful 
effort at rationalizing and integrating family 
database annotation at the EBI in the Inter- 
Pro resource (52). The databases that makeup 
the membership of the InterPro consortium 
are PROSITE (49), PRINTS (9), Pfam (53), 
ProDom (54), and SMART (55). InterProScan 
is a tool that enables scanning of individual 
protein sequences against the InterPro mem¬ 
ber databases (56). 

6 H-E BIOINFORMATICS KNOWLEDGE 
MODEL 

Up to this point, we have discussed sources of 
data and means of manipulating and compar¬ 
ing data elements (in terms of sequences, 
alignments, gene families, etc.), but the end 
point of all this analytical process must be the 
acquisition of knowledge. It is through in¬ 
creased understanding that sound decisions 
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can be made in applying the results of bioin¬ 
formatics analyses to application areas, such 
as drug discovery. So, in this section we con¬ 
sider the relationships between data, informa¬ 
tion and knowledge, which are frequently re¬ 
garded as poor relations to laboratory-based 
experimental data acquisition. However, as 
drug discovery organizations, including large 
pharmaceutical and smaller biotechnical com¬ 
panies, develop a significant history of assays, 
screens, and leads, it is vital to have strong 
internal support for managing data flows, in¬ 
tegrating related data into information sys¬ 
tems, and transforming knowledge thus 
gleaned into tangible benefits. 

6.1 Data, Information, and Knowledge 

According to the University of California at 
Berkeley (57), it has taken 300,000 years for 
humankind to accumulate 12 exabytes of 
data. 2 It will take just 2.5 more years to create 
the next 12 exabytes. (An exabyte is 
1,000,000,000,000,000,000 bytes or a billion 
gigabytes.) This is a truly unimaginable 
amount of data, equivalent to the data stored 
on a pile of floppy disks 24 miles high. It is the 
rate of accumulation of data that is the key 
point of interest, however, and the fact that it 
is accelerating. 

It is crucial to distinguish between the 
terms data, information, and knowledge so 
that we can think clearly about the goal of data 
accumulation in our own industry sector. 
There are two views: a tiered hierarchical view 
and a more formally correct scientific view. 

6.2 The Hierarchical View 

In the hierarchical view, data is the bottom 
rung of a ladder leading to the accumulation of 
information that leads, ultimately, to an in¬ 
crease in knowledge. Apply this hierarchical 
principal to an everyday example of taking 
this article to the photocopier: the data repre¬ 
sented by the article is the sequence of strokes 
and dots on the page that make up the page 
image. The page image is the information rep¬ 
resented by the article. The knowledge ele- 

2 In fact, the referred study uses the word "informa¬ 
tion." However, within the usage of this article 
"data" is a more appropriate term. 


ment only comes later when the observer ac¬ 
tually reads and understands the article. 
Compare this with the act of photocopying a 
research article, a process that does not in it¬ 
self add to understanding on the part of either 
the photocopier or of the researcher. The ac¬ 
quisition of knowledge implies an active rela¬ 
tionship between author and recipient of the 
information. In this, intuitive sense, we know 
that the hierarchical view works to some ex¬ 
tent as a model of the way in which some 
knowledge is acquired. 

6.3 The Scientific View 

The second view is the scientific one (58). 
Here, we start with the piece of information 
that we are trying to understand, perhaps a 
gene whose function we plan to determine. Ex¬ 
periments are designed and performed to de¬ 
termine the characteristics of the function of 
the gene; such experiments yield data that de¬ 
scribe aspects of the information. Knowledge 
comes from understanding and interpreting 
the results of the experiments. Again, knowl¬ 
edge is accumulated as part of an active rela¬ 
tionship between the data describing the in¬ 
formation and the investigator reviewing the 
data and drawing conclusions about the state 
of the information. Gene function is itself a 
complicated concept because the functions of 
gene products can rarely be assessed in isola¬ 
tion, owing to the network of interactions in 
which most genes are involved. A collection of 
sequence data, collected at the DNAor protein 
level, describe the molecular structure of a 
gene or its product at a primary level—it is 
not, however, a complete description. There 
are other biochemical factors to be considered; 
for example, proteins that assist in the folding 
process to create an active three-dimensional 
molecule, post-translational modifications, 
glycosylation, interactions with other mole¬ 
cules to generate a higher-level function, etc. 

6.4 Data is Not Knowledge 

Simply increasing the amount of data in the 
genomic universe does not necessarily in¬ 
crease the speed of knowledge acquisition. In 
short, data is not knowledge. Knowledge itself 
requires understanding and demands the ac¬ 
tive participation of the one acquiring the 
knowledge. 
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Most pharmaceutical organizations have in 
place the means for collating data from a wide 
variety of sources. Genomic information is 
available freely in the public domain as well as 
in proprietary databases. Some successful 
companies have used the multiple subscrip¬ 
tion database model to generate revenue to 
create more data to return to their customers. 
In this model, data tends to be available on a 
non-exclusive basis but it is up to the licensee 
to determine how best to interpret the data in 
its own research environment. Some informa¬ 
tion linking is available in such models, more 
especially those that use Internet portals as a 
user interface and results delivery mecha¬ 
nism. This has been a valuable means of ac¬ 
quiring data in gene and protein expression. 
The model is, however, showing signs of age. 
Pharmaceutical and biotechnology companies 
have needed to make substantial investments 
in technology and specialist skills (particu¬ 
larly bioinformatics) just to warehouse the 
data and make the analytical results available 
to drug discovery program scientists. Yet, all 
this effort still remains the heartland of 
genomics: target discovery groups have 
spmng up in the pharmaceutical and biotech¬ 
nology companies to create a process by means 
of which targets can be gleaned from the 
genomic morass. 

6.5 Drug Targets 

When considering drug targets, as opposed to 
simply gene products, there are a host of char¬ 
acteristics that must be taken into consider¬ 
ation. For all the drugs that are currently 
available on the commercial market, there are 
cnly about 500 drug targets on which they in¬ 
dividually act. We now understand the human 
genome to contain about 30,000 genes, which 
give rise to a still incalculable number of pro¬ 
tein products (59). How many of these prod¬ 
ucts represent tractable drug targets? The 
gdd standard for assessing whether a protein 
is or is not a target is target validation. In this 
process, additional biochemical or molecular 
genetic data are required to determine 
whether a protein is truly involved in a disease 
state and thus whether it could be considered 
to be a suitable target molecule for drug dis¬ 
covery. The prospect of performing such tar¬ 
get validation experiments, always assuming 


the results could be interpreted unambigu¬ 
ously, is a daunting one. Structural determi¬ 
nation, by X-ray crystallography or nuclear 
magnetic resonance, has deepened our under¬ 
standing of some biological processes immea¬ 
surably— particularly in the realm of certain 
proteases, DNA binding proteins, and some 
other soluble enzymes. The majority of drug 
targets are, however, membrane bound (for 
example, the plethora of ion channels and G- 
protein-coupled receptors), making struc¬ 
tural determination to any degree of critical 
confidence impossible. Molecular modeling 
can assist in this process and has been a valu¬ 
able tool for many years in thinking through 
possibilities and providinga framework for in¬ 
terpreting other biochemical results. The fur¬ 
ther away we move from rigorously deter¬ 
mined experimental data, however, the less 
likely is a pharmaceutical company to embark 
on the commitment of expense to exploratory 
studies in drug discovery. Finally, in consider¬ 
ing the suitability of a gene as a drug discovery 
target, we must take into consideration the 
temporal nature of gene expression, an area of 
research that has not yet been adequately ad¬ 
dressed in the analysis of genomic data. 

The alternative approach for dealing with 
the wealth of genomic data, in the context of 
drug discovery, is to consider the genomic datg 
as a background landscape against which to 
pick out the currently known and validated 
drug targets. These targets fall into families 
whose relationships can be rigorously deter¬ 
mined at the sequence level. The related mem¬ 
bers of these families can then be assessed by 
analogy to determine their appropriateness as 
drug targets. These phylogenetic (genefamily) 
relationships then form the basis of the struc¬ 
ture for a database and become both a tool for 
navigating and exploring the relationships 
themselves as well as a mechanism for inte¬ 
gration of other types of drug discovery data— 
for example, high-throughput screening re¬ 
sults and structure activity relationships of 
bound ligands. Thus, we can see that an inte¬ 
gration of information across the realms of 
genomics, target discovery, screening and lead 
optimization becomes possible and an achiev¬ 
able goal. 

Target analysis belongs in the domain of 
knowledge for drug discovery. Such knowl- 
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edge is a representation of our understanding 
of the function of a gene product in a disease 
state. This functional understanding is de¬ 
rived from the analysis of experimental data 
linked to our experience of the functional in¬ 
formation, itself defined (if only partially) by 
the data that characterize that functional in¬ 
formation. Setting such knowledge within the 
context of the genomic universe enhances our 
ability to select new targets for future valida¬ 
tion studies, either through molecular genetic 
techniques (gene knock-out, anti-sense, etc.) 
or through mechanistic validation using small 
molecule tools discovered as part of the assay 
development and screening process. 

There are attempts at capturing this type 
of knowledge within databases (known as 
knowledge-bases). Much valuable research is 
going on in the allied areas of knowledge rep¬ 
resentation. At present, active human partici¬ 
pation is required in accumulating knowledge 
and deriving ultimate benefit from it in the 
form of new drugs that fulfill unmet human 
needs in therapeutic situations. 

7 STRUCTURAL GENOMICS 

Structural genomics is the process of deter¬ 
mining the three-dimensional structures of an 
organism's proteome (60). Predictions of pro¬ 
tein function can be attempted from knowl¬ 
edge of the structure alone, or the additional 
information gained can be used to inform se¬ 
quence-based methods of functional predic¬ 
tion (61), 

The traditional paradigm of classical struc¬ 
tural biology has been to select a protein based 
on its known biological function, ascertain its 
molecular structure, and use the data thus 
gleaned to understand how its biological func¬ 
tion is carried out at the molecular level. To 
this end, more than 12,000 structures have 
been determined, to varying degrees of resolu¬ 
tion and confidence. Much has been learned, 
as a result, of the complexity of protein struc¬ 
ture and the manner of interaction of proteins 
and their native ligands, or of proteins and 
small molecule drugs. 

The essence of structural genomics is to 
start from a gene sequence, produce the func¬ 
tional protein, and then determine its three¬ 


dimensional structure. The biological func¬ 
tion of the protein in vivo is then deduced from 
an understanding of the structure. In this par¬ 
adigm, there is no limitation on the number of 
structures that can be determined except the 
ability to purify sufficient protein for crys¬ 
tallization trials. The usual caveats apply 
regarding the solution of structures of mem¬ 
brane-bound proteins, for example, G-pro- 
tein-coupled receptors, ion channels, certain 
classes of kinases, etc. 

Often, the most useful functional informa¬ 
tion is derived from the structures of protein- 
ligand complexes because they reveal the na¬ 
ture of the bound ligand and its location in the 
protein. In the case of enzymes, a catalytic 
mechanism can often be postulated taking 
into account the disposition of residues in the 
active site pocket. While such structures have 
traditionally been determined by design, the 
ligand is unknown in the structural genomics 
approach. Only in rare cases will a ligand be 
co-crystallized by serendipity from the cloning 
organism. 

From the perspective of bioinformatics, it is 
important to appreciate that structural deter¬ 
mination can only provide data that reflect the 
biochemical or biophysical properties of the 
protein. The biological role in the cell or or¬ 
ganism is a complex of interactions including 
spatial and temporal dimensions. Sometimes 
information can be derived using other tech¬ 
niques—for example, cDNA expression analy¬ 
sis, two-dimensional gel electrophoresis, bio¬ 
chemical assay, etc. Bioinformatics techniques 
building on other types of data can also assist 
in providing biological context for the function 
of a protein—for example, phylogenetic, fin¬ 
gerprint or regular expression analyses, etc. 
All of these techniques yield data that, when 
reviewed as a whole, can direct the course of 
further experiments or influence experimen¬ 
tal design. 

We have seen that, purely using techniques 
of sequence comparison, the function of about 
40% of genes sequenced from genome projects 
can be inferred from sequence identity or sim¬ 
ilarity measures or by motif comparison using 
a variety of techniques. It is known that pro¬ 
teins exhibiting insignificant sequence simi¬ 
larity often adopt similar tertiary structures, 
which themselves have similar (or at least re- 
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lated) molecular functions. In fact the variety 
of types of fold taken up by polypeptides is 
thought to be quite limited [SCOP (62) and 
CATH (63)]. Discovering a useful relationship 
between folding topology and sequence, which 
can be used to predict folding accurately is, 
however, not trivial. By comparing the struc¬ 
tures newly determined from structural 
genomics initiatives with structures already 
deposited in the Protein Data Bank, it may be 
possible to extend the inference of molecular 
function further than that achieved from se¬ 
quence comparisons alone. Once the molecu¬ 
lar function has been characterized in this 
computational way, we may begin to postulate 
the cellular function of the protein under anal¬ 
ysis. 

7.1 Predicting Protein Function from 
Structure 

Some consider the "Holy Grail" of computa¬ 
tional biology to be the accurate prediction of a 
protein's function solely from knowledge of its 
primary sequence. We have already briefly 
mentioned the role of structural information 
in guiding and illuminating the process of mo¬ 
lecular sequence alignment (Section 5.2.1). 
Structural data can be a truly effective means 
of understanding spatial relationships be¬ 
tween amino acid side-chains, backbone donor 
and acceptor groups, and the means of inter¬ 
action of natural and man-made drugs. For a 
thorough and insightful overview of these 
matters see Ref. 64—the entire volume is es¬ 
sential reading. 

Many methods have been proposed for cap¬ 
italizing on our understanding of protein 
structure by creating algorithms that attempt 
to predict function from structure, or place 
proteins in structural categories that may 
have implications for functional analysis. 

7.2 Neural Networks and Protease Function 

Stawiskiet al. (60) recently performed a study 
of the unique structural features of proteases 
in which the authors noted consistent struc¬ 
tural similarities among unrelated protease 
family members. They found that proteases 
tend to be more tightly packed than other pro¬ 
teins and they tend to have fewer a helical 
regions and more residues in loop structures. 


A neural network was trained to predict pro¬ 
tease function with 86% accuracy in a test set. 
Neural networks are an example of a tech¬ 
nique used in bioinformatics for generating a 
predictive program from a set of weights that 
can be applied in a learning tool. The tool is 
trained by using parameters that show dis¬ 
crimination between, in this case, proteases 
and non-proteases. In this example, 36 pro¬ 
teases were tested. Each protease in turn was 
used as a test example, the network being 
trained using the remaining 35 proteases. In 
31 of 36 cases (86%), the network was able to 
identify the remaining protease. By perform¬ 
ing the same test on 258 counter-examples, 
87% were correctly classified as non-pro- 
teases. 

7.3 Fold Compatibility Methods 

The ability to recognize the way in which a 
protein sequence is folded in three dimensions 
should enable us to model the interactions of 
specific side-chains in a manner that is simply 
not possible when considering proteins en¬ 
tirely at the sequence level. This notion has 
resulted in sequence threading algorithms 
that assess the level of compatibility of a se¬ 
quence with a database of fold patterns (65, 
66). The principal downside to this approach is 
that novel structural types cannot be pre-' 
dieted, because at least one example of each 
fold type must be present in the fold pattern 
database. Structural genomics may be the 
means whereby fold pattern databases can be 
populated with sufficient data to make them 
useful as predictive tools. 

7.4 CASP and the State-of-the-Art 

Currently, methods of structure prediction 
from sequence perform poorly. The results of 
the biannual CASP experiment (Critical As¬ 
sessment of techniques for protein Structure 
Prediction see http://predictioncenter.llnl. 
gov/) are equivocal to say the least. A recent 
report on the improvements of aligning target 
sequences to a structural template (67) indi¬ 
cates that over the last four CASP competi¬ 
tions there was no significant improvement in 
quality in this key step in the prediction pro¬ 
cess. Alignment remains the major source of 
error in all models based on less than 30% se- 
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quence identity. The subjective impression is 
that structure prediction is getting better year 
after year. This analysis, however, seems to 
suggest there is some way to go before reliable 
models can be generated for fold types not yet 
available in the structural databases. 


8 THE FUTURE 

Bioinformatics is a wide-ranging science that 
has developed over the last 50 years, since the 
discovery of the structure of DNA, a period 
that has resulted in the sequencing of the en¬ 
tire genomic material of major species. Tech¬ 
niques of sequence comparison, database 
management, design, and curation have re¬ 
sulted in a healthy base on which to build 
more automated systems. It is this author's 
view that experienced sequence analysts will 
always have a place in this process, guiding the 
design of new algorithms and better knowl¬ 
edge bases. Only in this way will the true syn¬ 
ergy between analyst and developer be real¬ 
ized and contribute to the understanding of 
the fruits of genomic research. 

Bioinformatics, allied to drug discovery, is 
used for discovering new potential drug tar¬ 
gets through the use of standard bioinformat¬ 
ics techniques in assigning function to novel 
gene products at the sequence level and by the 
informed use of structural, mutational and 
biochemical data—reflected in sequence level 
alignment models. Assessment of expression 
levels of genes and the statistical relevance of 
differences in levels of expression at the 
mRNA level has contributed to drug discovery 
programs in pharmaceutical and biotechnol¬ 
ogy companies globally. In many respects, it is 
still early days for seeing the fruits of this 
work in the products offered on the market by 
these companies. There should, however, be 
many clinical candidates on trial in which 
bioinformatics has contributed, albeit at a 
level of detail that is frequently far below the 
level of interest of industry publicists. The fact 
is that bioinformatics is now engrained in the 
discovery process for new drugs. The next 
stage in its development will be integration 
between chemoinformatics (chemical infor¬ 
matics) and bioinformatics, driven by the need 
to understand the ways drug interact with 


their targets rather than merely exploiting bi¬ 
ological assay systems as tools for drug discov¬ 
ery. 
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1 INTRODUCTION 

The term drug discovery once encompassed 
only those activities that were traditionally 
practiced by synthetic chemists — the design, 
synthesis, analysis, and testing of new chemi¬ 
cal entities. Until the 1980s, most drug discov¬ 
ery was conducted in a serial fashion. Thus, a 
chemist working on a given project would de¬ 
sign a series of structures, then synthesize 
them one after another, in milligram quanti¬ 
ties (large by today's standards), and finally 
send batches of the compounds for analysis 
and assay. Based on the assay results, the 
chemist would design new or modified sets of 
structures and repeat the cycle until a market¬ 
able entity was obtained. This serial, iterative 
procedure was adequate in a time when a few 
major drug companies were doing drug design, 
and the number of therapeutic targets was rel¬ 
atively small. One consequence of this ap¬ 
proach was a much higher number of “me- 
too" drugs on the market than we see today, 
likely because of the intensive time and re¬ 
source that was devoted to each new chemical 
entity. 

1.1 Motivation for Chemical Information 
Management 

The serial approach to drug discovery is very 
costly in time and resource. Figure 9.1 shows 
an idealized view of the drug discovery "fun¬ 
nel" in which a (very productive) hypothetical 
chemist could produce 10-20 structures a 
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week. In the 1970s, it was estimated that 1 in 
7000 compounds synthesized and tested 
would eventually reach the market. That 
number has risen over the years to about 1 in 
10,000—a figure that holds true to this day, 
despite advances in combinatorial and high- 
throughput chemistry, molecular modeling, 
structure-based drug design, diversity analy¬ 
sis, and quantitative structure-activity rela¬ 
tionships (QSAR). In real dollars, it almost 
certainly costs as much or more to bring a new 
drug to market than it did in the 1970s. A 
commonly quoted figure of $500 million per 
marketable drug has been questioned by a 
Ralph Nader watchdog group, but the figure is 
certainly in the hundreds of millions of dollars 
(l).To balance the many computational ad¬ 
vances that have been made in the past 30 
years are factors of increased competition in 
the field, many more therapeutic targets, in¬ 
creased regulation, and very importantly, the 
flood of information flowing from high- 
throughput methods. The advent of high- 
throughput combinatorial chemistry has in¬ 
creased the number of structures a chemist 
can generate by 100- to 1000-fold, with a cor¬ 
responding increase in the amount of data 
that must be gathered, stored, and processed. 

To deal with the flood of information— 
chemical, biological, and clinical—it became 
essential over the years to develop chemical 
information computing systems (i.e., chemical 
and reaction database systems) from which 
the chemist and biologist could obtain up-to- 
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Figure 9.1. Traditional "serial" drug design costs. The drug discovery "funnel" typically shows 
about a 10-fold reduction at each stage in the process. A chemist who could produce 10-20 structures 
per week would be lucky to discover a single marketable drug in a 20- to 30-year career. 


date information about commercially avail¬ 
able and in-house structures, reactions, and 
data. This chapter briefly describes the history 
of these systems, the current state of chemical 
information management as it applies to drug 
discovery, and a look at future developments 
in the field. The coverage is primarily aimed at 
corporate applications of chemical informa¬ 
tion management, as practiced in the pharma¬ 
ceutical industry. The expandinguse of micro¬ 
computers running Microsoft Windows or 
Linux operating systems means that many of 
the programs and database systems now used 
in industry can also be installed and applied in 
academic settings. Much of the innovation in 
chemical information management comes 
fiom academia, whereas most of the applica¬ 
tion has been seen in industry. This review is 
limited to the management and storage of 
chemical structure information in databases. 
Other chapters deal with the generation of 
this information (molecular modeling, prop¬ 
erty calculation) and with the use of the infor¬ 
mation in drug discovery (library design, dock¬ 
ing and structure-based drug design, and 
QSAR). By analogy with another rapidly ex¬ 
panding field, bioinformatics, the term chem- 


informatics (or chemoinformatics) has re¬ 
cently become common to describe the 
acquisition, management, and use of chemical 
information. 

1.2 Literature, References, Societies, 
and Research Groups 

The literature of chemistry is vast, and chem¬ 
ical information management occupies a small 
corner of this domain. The chemical informa¬ 
tion literature overlaps that of computer sci¬ 
ence, database management, molecular mod¬ 
eling, QSAR, and even mathematics. The 
primary journals that publish chemical infor¬ 
mation articles are the American Chemical So¬ 
ciety's Journal of Chemical Information and 
Computer Sciences, Kluwer’s Journal of Com¬ 
puter-Aided Molecular Design, and Elsevier's 
Journal of Molecular Graphics and Modeling. 
Less frequently, chemical information articles 
appear in Wiley’s Journal of Computational 
Chemistry, Quantitative Structure-Activity Re¬ 
lationships, and Journal of Chemometrics , the 
ACS Journal cf Medicinal Chemistry and the 
Journal cf Organic Chemistry, and Elsevier's 
Analytica Chimica Acta, Computers and 
Chemistry, and Chemometrics and Intelligent 
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Laboratory Systems. Other journals with 
articles on chemical information include the 
University of Bayreuth’s Communications 
in Mathematical Chemistry (MATCH), Else¬ 
vier's Drug Discovery Today, ACS’s Modern 
Drug Discovery, and a handful of newer peri¬ 
odicals (2). 

The history of chemical information man¬ 
agement has recently been catalogued online 
by the Chemical Heritage Foundation (3).The 
American Chemical Society has a Division of 
Chemical Information (CINF), and divisional 
symposia are held at national meetings of the 
ACS, often in conjunction with other divisions 
including Medicinal Chemistry, Computers in 
Chemistry, and Pesticide Chemistry. The 
Skolnik Award is given annually by the ACS 
Division of Chemical Information for 
"achievement in the areas of computerized in¬ 
formation systems, chemical information, 
chemical indexing and notation systems, no¬ 
menclature, structure-activity relationships, 
and numerical data analysis and correlation." 
Herman Skolnik, who died in 1994, was the 
first recipient. He founded the Journal of 
Chemical Documentation, which became the 
Journal of Chemical Information and Com¬ 
puter Sciences, and he made many contribu¬ 
tions to the field (4). Besides the ACS, other 
national and international meetings on chem¬ 
ical information include the Noordwijkerhout 
Conference on Chemical Structures (5), the 
Quantitative Structure-Activity Relationship 
Gordon Conference (6), and the International 
Conference on Chemical Information (7). 

Except for journal articles and some confer¬ 
ence proceedings, very recent general books 
on chemical information management are 
rather few in number. This is caused in part by 
the rapid changes in a field so closely tied to 
computer hardware and software develop¬ 
ment. Another reason for the paucity of texts 
is that most chemical information manage¬ 
ment systems are commercially developed and 
marketed, not widely used by universities, and 
in many cases, they use trademarked or even 
patented technology. Some texts of note in the 
last decade include several by Collier (8), Mar¬ 
tin and Willett (9), and Warr and Suhr (10), 
one by Wiggins and Emry (11), and ones by 
Maizell (12) and Ash et al. (13), and a book on 
chemical searching by Ridley (14). 


Most chemical information research and 
development is conducted by commercial soft¬ 
ware vendors and in-house at pharmaceutical 
firms. A small number of academic research 
groups study chemical information. The Com¬ 
putational Information Systems group at the 
University of Sheffield, under Peter Willett, 
has been very active in studying database 
searching (15). The Computer-Chemie-Cen- 
trum at the University of Erlangen under Jo¬ 
hann Gasteiger focuses on organic structure 
representation and reaction classification 
(16). Numerous other academic groups are ac¬ 
tive in QSAR and modeling research, de¬ 
scribed in other chapters in this series. 

In addition to the academic groups already 
mentioned, a number of online resources deal 
with chemical information management. Ex¬ 
amples include the comprehensive CHEM- 
INFO site at Indiana University (17), Cam¬ 
bridge Health Institute's Cheminformatics 
Glossary (18), the Chemical Structure Associ¬ 
ation (19), the Computational Chemistry List 
(CCL) (20), the Molecular Graphics and Mod¬ 
eling Society (21), the Open Molecule Founda¬ 
tion (22), the QSAR and Modeling Society 
(23), the Royal Society of Chemistry Chemical 
Information Group (24), and the UK QSAR 
and Cheminformatics Group (25). 

■ 

1.3 Brief History of Chemical Data 
Management 

The history of chemical information manage¬ 
ment parallels the history of computers. It can 
be roughly viewed in terms of decades of de¬ 
velopment (Fig. 9.2). 

1.3.1 Pre-1980—Flat File Storage of Chem¬ 
ical Structures. Computers consisted of main¬ 
frame machines (e.g., IBM 3090) and small 
minicomputers (Digital, Prime). Users con¬ 
nected through low speed serial connections, 
using "dumb" terminals (no graphics capabil¬ 
ity) or monochrome vector graphics terminals 
such as Tektronix and Imlac. Chemical struc¬ 
tures were mainly stored as either (1 individ¬ 
ual structure files, indexed by name, and han¬ 
dled one or a few structures at a time or (2) in 
a flat-file database accessed by record number 
(26). A typical corporate database contained 
up to a few tens of thousands of structures. 
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1970's - individual files cf chemical structures 



1980’s - flat databases of chemical structures and reactions 



1990's - relational database framework (oracle™,Microsoft™ access) 
ID extreg Extreg Mwt formula Formula keys 



2000's - chemical data marts and data warehouses - the "star" schema 



Figure 9.2. Evolution of chemical information storage. The storage of chemical information has 
typically lagged the development of database management systems, but it is catching up. In the 
1970s, structures were stored in individual molecule files or large concatenated files. In the 1980s, 
proprietary databases of structures and reactions appeared, in which a single record contained all the 
information for a given structure. In the 1990s, this information was distributed into tables in a 
relational database. In the 2000s, we see the application of the concepts of data warehousing and data 
marts that consolidate information from a variety of sources for transactional and/or analytical 
purposes 


In-house chemical information management 
systems began to emerge at some of the larger 
chemical and pharmaceutical firms. These in¬ 
cluded CONTRAST and SOCRATES at 
Pfizer, SYNLIB at SmithKline, COUSIN at 
Upjohn, MSDRL/CSIS at Merck, and CROSS¬ 
BOW at ICI (27). The Chemical Abstracts da¬ 
tabase was made available online in 1967 (28). 
In 1980 this became CAS ONLINE. A compre¬ 


hensive study of the user acceptance of CAS 
ONLINE was published in 1988 (29).The first 
commercial chemical structure database sys¬ 
tems appeared in the late 1970s. These offered 
an in-house solution using a mainframe chem¬ 
ical structure management system with a 
graphical interface, which could be accessed 
by interactive graphics terminals. A standard 
program in widespread use was the MACCS 
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Figure 9.3. MACCS—the Molecular ACCess System—an early structure indexing system. This 
program originally used fixed menus for searching, registration, and reporting. Later versions al¬ 
lowed users to customize the menus. The figure shows the result of a 3D pharmacophore search for 
ACE inhibitors. Out of a database of 115,000 structures, 21 fit the 2D and 3D requirements of the 
search query. The user could typically browse the "hits" from the search, save the list of structures to 
a list file, and output the structures to a structure-data file (SDFile). The MACCS database was a 
proprietary flat database system in which data of a given type, say, formula, was stored in a given file, 
indexed by the compound ID number. 


program (Fig. 9.3). Structures could be drawn, 
registered, searched, and output to files. The 
systems were only slightly customizable, and 
the graphics terminals, which used vector dis¬ 
plays, were large and expensive. 

1.3.2 The 1980s—Flat Database Stor¬ 
age. This was the era of minicomputers 
(Prime, Vax) and a period of immense growth 
for chemical information, molecular model¬ 
ing, and QSAR. In industry, chemical struc¬ 
ture databases consisted mainly of custom-de¬ 
signed "flat" databases (where each record in 
a given table refers to a given structure in the 
database—much like in a spreadsheet). Cli¬ 
ent-server architectures appeared, and per¬ 
sonal computers replaced graphics terminals 


and workstations. Highly successful PC-based 
"personal" chemical information systems ap¬ 
peared, which included chemical structure 
drawing and text processing programs (e.g., 
ChemDraw, ChemText) and personal chemi¬ 
cal databases (e.g., ChemBase) (30). Customi¬ 
zable mainframe systems appeared (31), as did 
reaction indexing and searching systems (32). 
Additional commercial chemical information 
vendors appeared including Daylight Chemi¬ 
cal Information Systems, Chemical Design 
Ltd., DARC-Questel, and Cambridge Scien¬ 
tific Corporation. The Beilstein System came 
online in 1988 (33). In-house and commercial 
database sizes were typically 100-200K struc¬ 
tures in size. The rapid and accurate conver¬ 
sion of two-dimensional (2D) structures to 
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three-dimensional (3D) models became possi¬ 
ble using the program CONCORD, introduced 
by Pearlman in 1987 (34). This enabled the 
introduction of 3D structural databases with 
the ability to generate, store, and search 3D 
molecular models on a large scale. These 3D 
database systems included ALADDIN by 
Daylight Chemical Information Systems, 
UNITY3D by Tripos, CHEMDBS3D by Chem¬ 
ical Design Ltd., and MACCS3D by MDL (35). 

1.3.3 The 1990s—Relational Data Stor¬ 
age. This period saw the decline of single¬ 
computer mainframe chemical management 
programs and the rise of server-based systems 
and distributed computing. By far, the most 
significant influences on chemical information 
management were the Internet, the introduc¬ 
tion cf relational database technology, and the 
shift to high-throughput combinatorial chem¬ 
istry. In a relational database, information 
that formerly was kept in a single large table is 
stored in numerous smaller tables, indexed by 
"keys." This is a much more flexible architec¬ 
ture, and combining different fields from sev¬ 
eral tables into a "view" of the data gives the 
user the impression of a single large table, as 
before. At the end of the decade, chemical and 
pharmaceutical firms could obtain chemical 
structure, reaction, and 3D model databases 
from a variety of vendors. These databases 
were even somewhat integrated with molecu¬ 
lar modeling, quantum mechanics, and dock¬ 
ing programs, and to literature, spectra, and 
biological databases. The largest database of 
known chemical structures, the Chemical Ab¬ 
stracts Registry, grew to about 20 million 
structures, whereas a typical corporate inven¬ 
tory increased to between 100,000 and 
1,000,000 structures. A database of billions of 
virtual chemical structures was constructed 
and made available for drug-design purposes 
ty Tripos, Inc. (36). 

1.3.4 The 2000s. Like the customization 
and distributed computing of the 1980s that 
followed the introduction of mini-mainframe 
systems, the 2000s are witnessing the cus¬ 
tomization and further distribution of rela¬ 
tional and integrated database systems. 
Chemical structure-specific and reaction-spe¬ 
cific search types can be integrated into rela¬ 


tional databases, to take maximal advantage 
of the scale and performance of these systems. 
We see the increasing use of web-based clients, 
also known as "thin" clients, because they 
need little software other than a web browser. 
Former single databases are turning into dis¬ 
tributed and replicated database systems, and 
we see increasing use of data marts and data 
warehouses, more fully integrated structure, 
reaction, data, and citation searching, and in¬ 
creasingly "intelligent" database systems. 

2 CHEMICAL REPRESENTATION 

Chemical structures and reactions can be rep¬ 
resented in many ways. At the most funda¬ 
mental level, the parameters of the time-de¬ 
pendent Schrodinger equation—the atomic 
and molecular orbitals — do a more or less 
completejob of characterizing a chemical com¬ 
pound. Storing and representing structures as 
mathematical wave functions is obviously not 
suitable for thousands or millions of struc¬ 
tures; nor is such a representation useful for 
drug discovery, except perhaps to a molecular 
modeler. Synthetic chemists still function in a 
mostly 2D chemical structure space. Intuition, 
training, and experience allow a chemist to ex¬ 
trapolate from a flat representation with a few 
stereochemical hints — dashed and wedged 
bonds or Z/E double bonds—to a higher-di¬ 
mensional mental representation of a struc¬ 
ture. Chemical representation systems are a 
compromise of several factors, including the 
needs of the chemist, the storage and perfor¬ 
mance characteristics of the chemical data¬ 
base system, and the ultimate 3D reality of 
chemical structures. 

2.1 Types of Chemical Entities 

There are several ways to look at chemical rep¬ 
resentation. One approach is to classify ac¬ 
cording to the type of chemical data that is 
stored. The most basic types of chemical struc¬ 
ture data are shown in Fig. 9.4, including the 
following. 

2.1.1 Sequences. For linear chemical sys¬ 
tems, such as DNA, RNA, and proteins, the 
sequence of subunits (nucleotide bases or 
amino acids) provides most of the information 
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Types cf chemical data 

• Sequences, names, linear notations - 1-dimensional 
information 



• Structures, reactions - 2-dimensional information 



• X) models - 3-dimensional information 



Figure 9.4. Basic types of 2D chemical structure data. The amount of information and the complex¬ 
ity of searching increases with the dimensionality of the data. 


about the structure. The deciphering of the 
human genome and the exploding interest in 
bioinformatics as a means of identifying new 
drug targets means there will be an increasing 
growth in the use of sequence data. The use of 
a sequence representation depends on a natu¬ 
ral "vocabulary" of fixed building blocks. This 
vocabulary consists of nucleotides in the case 
of nucleic acids and consists of the amino acids 
in the case of proteins. If any of the building 
blocks are unique, or even if the bonding at¬ 


tachment between building blocks differs, a 
simple sequence notation is not possible or it 
becomes more complex. 

2.1.2 2D Structures. When the building 
blocks are unique or when dealing with the 
large variety of ordinary chemical structures, 
a 2D representation is used. In mathematical 
terms, this is a "graph" of the structure, which 
consists of a set of "nodes" (atoms) connected 
by "edges" (bonds).The important atominfor- 
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mation includes atom type (symbol or atomic 
number), its 2D coordinates, formal charge, 
valence state, atom stereochemistry, and iso¬ 
tope information. Note that atom stereochem¬ 
istry can be local (i.e., relative) or it may follow 
Cahn-Ingold-Prelog (CIP) conventions. Local 
atom stereochemistry gives the clockwise or 
counter-clockwise direction of the attachment 
of neighboring atoms when viewed from some 
reference attached atom—often a hydrogen 
atom or the lowest atomic numbered atom 
(37). The order of atoms in the rotation usu¬ 
ally depends on atomic number. CIP stereo¬ 
chemistry is the familiar “ R,S ” nomenclature 
that relates the stereochemistry of the given 
atom to the entire structure (38). CIP stereo¬ 
chemistry requires analyzing the entire struc¬ 
ture to determine the stereochemistry values. 
It can occasionally be ambiguous, and if any 
part of the structure changes, the CIP stereo¬ 
chemistry on distant atoms in the structure 
may switch. For these reasons, it is common in 
chemical databases to store local atom stereo¬ 
chemistry, but to perceive CIP stereochemis¬ 
try "on the fly." 

A particular problem with relative stereo¬ 
chemistry is that a given combination of "up" 
and "down" bonds on a structure implies a 
mixture of at least two stereoisomers. If all the 
centers are specified, the structure represents 
at least the two enantiomers. If some of the 
stereo centers are not designated, the number 
of isomers the structure represents is 2‘\ 
where n is the number of undesignated cen¬ 
ters. Some database vendors (e.g., MDL) allow 
a "chiral" designation on the molecule, which 
indicates that the structure represents only a 
single stereoisomer, but does not specify 
which one. One approach to dealing with these 
problems, which is being adopted in MDL pro¬ 
grams, is to allow three kinds of stereo desig¬ 
nation at a given tetrahedral center: 

1. Absolute—an atom is given a known abso¬ 
lute stereochemistry. If all the stereo cen¬ 
ters are so designated, this represents a sin¬ 
gle stereoisomer of the structure, as drawn. 

2. Relative as a single stereoisomer—an up or 
down bond represents the current relative 
configuration, with respect to some collec¬ 
tion of other chiral centers in the structure. 


The structure represents a single stereoiso¬ 
mer among the possible ones. More than 
one collection of stereo centers may be 
present in the structure. 

3. Relative as a mixture of stereoisomers — an 
up or down bond represents the current 
relative configuration, with respect to some 
collection of other chiral centers in the 
structure. Now, however, the structure 
represents a mixture of the possible stereo¬ 
isomers, considering combinations of the 
stereo collections that are present. 

Examples of these alternatives are shown 
in Fig. 9.5, which shows the present and the 
newer stereochemistry options, using a ste¬ 
roid structure as an example. 

The bond information usually includes the 
bonding atoms, the bond type, and bond stere¬ 
ochemistry. Bond types include the common 
single, double, triple, and aromatic types. 
They may also include types that are unique to 
the type of structure, including dative, ionic, 
hydrogen bonds, etc. The bond stereochemis¬ 
try for double bonds is usually Z (Zusammen- 
together), E (Entgegen-opposite), or either 
(indicating an unknown stereochemistry). For 
single bonds attached to a chiral or prochiral 
center it is typically "up" (wedge or thick . 
bond), "down" (dashed or dotted bond),or "ei¬ 
ther" (often a wiggly bond). Some systems al¬ 
low the representation of extended stereo¬ 
chemistry, as with the terminal groups of 
allene systems, which can show a type of tet¬ 
rahedral stereochemistry if you collapse the 
allene system to a point. The bonding informa¬ 
tion—which atoms are attached to which 
other atoms and the bond types—is collected 
in the "connection table" of the structure. Ta¬ 
ble 9.1 shows a simple atom connection table 
for camphor. The diagonal elements of the ta¬ 
ble describe the type of atom at a given posi¬ 
tion in the structure. The off-diagonal ele¬ 
ments describe the bonding of that atom with 
other atoms in the structure. Some informa¬ 
tion about a structure can be derived implic¬ 
itly from the connection table. This includes 
the rings that are present, and the hydrogen 
atoms that could be attached. When a struc¬ 
ture can be represented by more than one iso¬ 
mer, it is common to either (I) store multiple 
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Chiral 




Figure 9.5. Defining absolute, relative collec¬ 
tion, and relative single configuration stereo¬ 
chemistry. The older convention depends on a 
"chiral" flag on the molecule to specify whether 
a given structure represents one or several ste¬ 
reoisomers. In the newer convention, collec¬ 
tions of stereo centers can be defined, and they 
can be designated absolute, relative-part-of-a- 
mixture, or relative-single-configuration. 


RelMixl O 



Current convention: 
a single 

stereoisomer with 
known absolute 
configuration 


A single stereoisomer 
whose absolute 
configuration is 
known 


A mixture cf 

relative 

stereoisomers 


A single 
stereoisomer cf 
known relative 
configuration 


isomers in the database, or (2)run a structure 
search using a search query that will hit the 
desired isomers. This is true for stereoiso¬ 
mers, enantiomers, and tautomers. Because 
the connection table is often symmetric, it is 
possible to store only, say, the upper diagonal 
part of it in the database. 

2.1.3 Reactions. Chemical reactions ex¬ 
tend the structure representation by adding 
information about what role the structure 
plays in the reaction (reactant, catalyst, sol¬ 
vent, product, etc.). Reaction representation 
may also include information about what 
bonds are made or broken during the reaction, 
and which atoms are involved in reacting cen¬ 
ters. It is also common to use a hierarchical 
organization for reaction information (reac¬ 
tion > variation > reactants, catalysts, sol¬ 
vents, products, etc.). 


2.1.4 3D Models. These extend the struc¬ 
ture representation by adding one or more 
sets of 3D atomic coordinates for the various 
conformations that the molecule can adopt. 
3D model representation may also include ad¬ 
ditional atom or bond information such as par¬ 
tial charge or partial bond order. It is common 
to generate approximate 3D models from 2D 
structures using fast abbreviated molecular 
modeling and fragment joining methods such 
as CONCORD, CORINA, and CONVERTER 
(39). These programs combine molecular me¬ 
chanics with rules and heuristics to generate 
reasonable 3D structures in a fraction of 
the time required by molecular mechanics 
or quantum mechanics modeling. Typically, 
hundreds of structures can be processed per 
second. Although the resulting models are not 
the lowest energy models possible, they are 
quite suitable for 3D pharmacophore searching, 
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Table 9.1 Connection Table for D-Camphor 
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and they serve as a good starting point for 
further optimization. 

Recently, with the use of combinatorial and 
high-throughput chemistry, more general 
types of structure representation, so-called 
chemical libraries, have become common (Fig. 


9.6). These are typically used to represent 
mixtures and generic structures. 

2.1.5 Mixtures. Mixtures are useful to rep¬ 
resent isomers, formulations, and the prod¬ 
ucts of reactions. Their representation usually 


Chemical libraries 


Mixtures: 



Number cf specifics = n(R1 )* n(R2)* n(R3) - hence, combinational 

Figure 9.6. Chemical structure data for high-throughput chemistry. The generic structure repre¬ 
sentation is often referred to as a Markush structure. 
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requires adding data to specify percent or 
amount content in the mixture for each com¬ 
ponent. 

2.1.6 Generic Structures. Generic or Mar- 
kush structures are commonly used to represent 
structures for patent purposes. Since the intro¬ 
duction of combinatorial chemistry, generic 
structures and generic reactions have become a 
standard means of representing potentially 
huge numbers of specific compounds in a highly 
compact representation that is familiar to the 
chemist (40). The central structure of a generic, 
which is common to all the structures it repre¬ 
sents, is commonly called the "root" or "parent." 
The variable parts of the structure (R x , R 2) etc.) 
are referred to as the "Rgroups." The exact sub¬ 
stituents that make up the various Rgroups 
(e.g., — Cl, —Br, —OH) are referred to as the 
"members" of the Rgroup. Finally, a specific 
combination of root and Rgroup members— 
which constitutes a single, real structure—is re¬ 
ferred to as a specific or "enumerated" struc¬ 
ture. Some chemical computations, like 
property and similarity calculations, can be per¬ 
formed on the generic structure without enu¬ 
merating all the specific structures (41). 

The remaining types of chemical data that 
need representation include substances and 
search queries. 

2.1.7 Substances. Less common in drug 
discovery, but very useful for material science 
and polymer chemistry, is the ability to store 
"substances." These include unspecified or 
uncertain chemical structures, polymers, and 
other chemical entities that cannot be classed 
with the other chemical representations (42). 
Polymers pose particular problems, as dis¬ 
cussed in the article by Schultz and Wilks (43). 

2.1.8 Search Queries. For all types of 
chemical representation, there are query rep¬ 
resentations that can be applied to a database 
to return a list of structures which match or 
"fit"the query, or that the query "hits" in the 
database. The same chemical drawing pro¬ 
grams that are used to input structures can 
commonly be used to input chemical structure 
queries. These drawing programs currently 
include several programs in the commercial 
and public domains (44). A comparison of pop¬ 


ular drawing programs has recently appeared 
on the Internet (45). Query structures often 
contain generalized atom types, bond types, 
and ring types. They may specify the required 
presence or absence of certain atom types or 
functional groups. In the case of 3D models, 
queries can be devised to represent pharma¬ 
cophores for certain types of therapeutic activ¬ 
ity (46). An important distinction must often 
be made between the query representation cf 
a pharmacophore used for 3D searching and 
the conceptual pharmacophore used for drug 
development. 

2.2 Types of Chemical Representation 

A second way of looking at chemical represen¬ 
tation is to consider the manner in which the 
chemical structure data is organized and ex¬ 
changed, either in some file format or in a da¬ 
tabase. The most common ways of represent¬ 
ing structures and reactions include the 
following. 

2.2.1 Linear Notation. One of the earliest 
forms of chemical structure representation is 
Wiswesser line notation (WLN), developed in 
1946. This notation used short letter codes to 
represent functional groups in molecules (47). 
An alternative early notation is the Beilstejn 
ROSDAL string (48). These two formats are 
not used much today, having been replaced by 
the Daylight SMILES notation (49) and its ex¬ 
tensions (50). Figure 9.7 shows a drug-like 
molecule along with WLN, SMILES, and ROS¬ 
DAL notation. Also shown is a simple chemical 
reaction represented in SMILES. Note that 
SMILES and other linear notation schemes do 
not include 2D coordinates for display of the 
structure. These are either stored separately 
or generated on the fly (51). The SMILES no¬ 
tation has become especially popular for prop¬ 
erty estimation programs, because atom coor¬ 
dinates are not usually needed for connection 
table-based calculations. It is a very conve¬ 
nient method for web-based input of struc¬ 
tures for property calculations (52). Note that 
the order of atoms in most linear notations is 
arbitrary, depending on where in the molecule 
the notation generator (program or chemist) 
starts. For this reason, some linear notations 
have a canonical (or "uniquified") form that 
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O 


h 3 c 



S-OH 

W 

o 


WLN: L66J BMR& DSWQ IN1&1 

ROSDAL: 1 = -5-= 10 = 5,10-1, 1 -1 IN-12- =17 = 12, 3-18S-190, 18-200, 18 = 210, 8-22N-23, 22-24 


SMILES: 0S(=0){=0)c1 cc2ccc(cc2c(c1 )Nc1 cccccl)N(C)C 

SLN: 0S(=0)(=0)C[1]:C:C[2]:C:C:C(:C:C(:@2):C(:C(:@1))NC[3]:C:C:C:C:C(:@3)) N(C)C 
CHIME: 3aQf713AsUwQDjlMwyMWrSA7AOxHeqiAAPWRmMrZSZIJjTrAEfcsXH1JTUf... 


100 

2 3 99 0 H 

O. Cl H^ 4/ CH 3 2 I 

1 + N _N o + 

I 1 4 CH 3 

H 
100 

SMILES: 


[C:1](=[0:2])[CI:3].[H:99][N:4]([H:100])[C:0]» [C:13(=[0:2])[N:4]([H: 100])[C:0].[CI:3][H:99] 


Figure 9.7. Various linear notation schemes for chemical representation. Some contain only atom 
types and connectivity (WLN, ROSDAL, SMILES, SLN) and are chemist-readable. Others are com¬ 
pressed versions of molecule file formats (CHIME) and are meant for computer interpretation. 



places the atoms in a topological order, usually 
reflecting their degree of branching, the types 
cf neighboring atoms and bonds, etc. This ca¬ 
nonical ordering of the atoms reduces any 
user-input ordering to the same string. It can 
then be used for exact-match lookup of the 
structure, regardless of how it was drawn or 
typed. The SMILES notation has also been ex¬ 
tended to include reactions as shown in Fig. 
9.7 (53). Occasionally, other linear notations 
are described (54). 

2.2.2 Tabular Storage. To preserve more 
specific information about atoms and bonds, 
such as coordinates, stereochemistry, charge, 
and isotope number, it is necessary to store 
molecule information in a tabular format. 
Each row of the table typically contains all the 
information about a single atom or bond. In 
some formats, the atom and bond information 
is combined on a single line. Table 9.2 shows 
three common file formats for a simple struc¬ 


ture. In the MDL molfile format, the atom and 
bond information is separated into separate 
blocks. In the HyperchemHIN file format, the 
bond information is mixed with the atom in¬ 
formation, resulting in fewer records in the 
file. In the PDB format, the atoms can be as¬ 
signed to residues. Descriptions of various for¬ 
mats can be found in the reference manuals 
for chemical management and molecular mod¬ 
eling programs or in the literature (55). The 
systems that manage reactions typically have 
their own file formats as well. 

Both linear and tabular formats are capa¬ 
ble of being transmitted over a network be¬ 
tween computers. This allows passing struc¬ 
ture information from a server to a 
workstation for display purposes. It is com¬ 
mon to compress and/or encrypt the chemical 
structure information before it is transmitted, 
and then have the workstation or display pro¬ 
gram uncompress or decrypt the resulting 
structures. This is done for performance and 




370 


Chemical Information Computing Systems in Drug Discovery 


Table 9.2 Tabular Molecule File Formats 


10 9 



MDL Molfile format: 

D-Camphor 

-ISIS- 03130218162D 

2D molfile 

11 12 0000000 0999 V2000 


-2.0625 

-1.1833 

0.0000 

c 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

-1.5583 

-0.4167 

0.0000 

c 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

-0.4208 

0.1500 

0.0000 

c 

0 

0 

3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

-0.8292 

-0.7542 

0.0000 

c 

0 

0 

2 

0 

0 

0 

0 

0 

0 

0 

0 

0 

-0.7042 

1.1667 

0.0000 

c 

0 

0 

3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.9917 

-0.4333 

0.0000 

c 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.4667 

-1.2167 

0.0000 

c 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.5237 

1.7332 

0.0000 

c 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

-1.9208 

1.6686 

0.0000 

c 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0.9875 

-2.2803 

0.0000 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

-1.6004 

-2.0787 

0.0000 

c 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


1 2 1 0 0 0 0 

2 3 1 0 0 0 0 

1 4 1 0 0 0 0 

4 5 1 0 0 0 0 

5 3 1 0 0 0 0 

3 6 1 0 0 0 0 

6 7 1 0 0 0 0 

7 4 1 0 0 0 0 

5 8 1 0 0 0 0 

5 9 1 0 0 0 0 

71020000 
41111000 

M END 

Hyperchem HIN file format: 
moll C:\TEMP\DCAMPHOR.HIN 


atom 

1 

- c 

** 

- 0 

0.0000 

0.8445 

0.0000 

2 

2 

s 

4 

s 



atom 

2 

- c 

** 

- 0 

0.3881 

1.4347 

0.0000 

2 

1 

s 

3 

s 



atom 

3 

- c 

** 

- 0 

1.2639 

1.8710 

0.0000 

3 

2 

s 

5 

s 

6 

s 

atom 

4 

- c 

** 

- 0 

0.9494 

1.1749 

0.0000 

4 

1 

s 

5 

s 

7 

s 

atom 

5 

- c 

** 

- 0 

1.0457 

2.6537 

0.0000 

4 

4 

s 

3 

s 

8 

s 

atom 

6 

- c 

** 

■ 0 

2.3513 

1.4219 

0.0000 

2 

3 

s 

7 

s 



atom 

7 

- c 

** 

- 0 

1.9471 

0.8189 

0.0000 

3 

6 

s 

4 

s 

10 

d 

atom 

8 

- c 

** 

- 0 

1.9910 

3.0899 

0.0000 

1 

5 

s 





atom 

9 

- c 

** 

- 0 

0.1090 

3.0401 

0.0000 

1 

5 

s 





atom 

10 

- 0 

** 

■ 0 

2.3481 

0.0000 

0.0000 

1 

7 

d 





atom 

11 

- c 

** 

- 0 

0.3557 

0.1552 

1.0000 

1 

4 

s 






endmol 1 
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Table 9.2 ( Continued) 

Protein Data Bank format: 








HEADER 

PROTEIN 








COMPND 

c:\temp\dcamphor.pdb 







AUTHOR 

GENERATED BY BABEL 1.6 






ATOM 

1 

C 

UNK 

1 

-2.063 

-1.183 

0.000 

1.00 

0.00 

ATOM 

2 

c 

UNK 

1 

-1.558 

-0.417 

0.000 

1.00 

0.00 

ATOM 

3 

c 

UNK 

1 

-0.421 

0.150 

0.000 

1.00 

0.00 

ATOM 

4 

c 

UNK 

1 

-0.829 

-0.754 

0.000 

1.00 

0.00 

ATOM 

5 

c 

UNK 

1 

-0.704 

1.167 

0.000 

1.00 

0.00 

ATOM 

6 

c 

UNK 

1 

0.992 

-0.433 

0.000 

1.00 

0.00 

ATOM 

7 

c 

UNK 

1 

0.467 

-1.217 

0.000 

1.00 

0.00 

ATOM 

8 

c 

UNK 

1 

0.524 

1.733 

0.000 

1.00 

0.00 

ATOM 

9 

c 

UNK 

1 

-1.921 

1.669 

0.000 

1.00 

0.00 

ATOM 

10 

0 

UNK 

1 

0.988 

-2.280 

0.000 

1.00 

0.00 

ATOM 

11 

c 

UNK 

1 

-1.600 

-2.079 

0.000 

1.00 

0.00 

CONECT 

1 

2 

4 







CONECT 

2 

1 

3 







CONECT 

3 

2 

5 

6 






CONECT 

4 

1 

5 

7 

11 





CONECT 

5 

4 

3 

8 

9 





CONECT 

6 

3 

7 







CONECT 

7 

6 

4 

10 






CONECT 

8 

5 








CONECT 

9 

5 








CONECT 

10 

7 








CONECT 

11 

4 








MASTER 


0 

0 0 

0 

0 0 

0 0 

11 

0 11 

0 

END 











for security. In MDL systems, the Chime lin¬ 
ear format is used to transmit structures and 
reactions, whereas Daylight systems simply 
use the SMILES representation and depict the 
structure on the fly (Fig. 9.7). 

2.2.3 Graphical Representation. Occasion¬ 
ally it is desirable to store chemical structures 
as "pictures", usually for document purposes. 
For example, some chemical drawing pack¬ 
ages and many molecular modeling packages 
can store structures as the following: 

• WordPerfect or Microsoft Word document 
(.doc files) 

• Extended postscript (.eps files) 

• Windows metafile (.wmf files) 

• A proprietary sketch (MDF .skc files) 

• A variety of compressed graphics formats in¬ 
cluding JPEG (.jpg files), bitmap (.bmp 
files),GIF (.gif files),and TIFF (.tif files) 


Often, the graphical format allows the con¬ 
nection table to be stored and transferred 
transparently with the image—through the 
computer's clipboard, for instance. This al¬ 
lows the receiving program to "interpret" the 
image as a chemical structure and manipulate 
it accordingly. 

2.2.4 Markup Languages. The Internet has 
spawned a host of new "languages" that fa¬ 
cilitate the exchange of information. The 
most common of these are HTMF (hypertext 
markup language) and XMF (extensible 
markup language). A variation of XMF that is 
designed for chemical information exchange is 
the Chemical Markup FanguageCMF (56). Al¬ 
though it is not widely used as of this writing, 
it bears watching as more web-based chemical 

information platforms become available. 
Problems with markup languages are that 
they are verbose compared with structure 
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Table 9.3 Chemical Markup Representation of Acetic Acid 

(molecule convention=“MDLMol” id = “acetate” title= "ACETATE") 
(date day ="23" month=“H” year= "1995" /> 

(atomArray) 

(atom id=“al”) 

(string builtin=“ ele men tTyp e f 1 )C(/ str i ng> 

(float builtin=“x2”)0.27(/float) 

(float builtin=“y2”)0.1217(/float> 

(/atom) 

(atomid=“a2”) 

(string builtin= “elementType”)C(/string) 

(float builtin-“x2”>- 1.27(/float) 

(float builtin=“y2 ”)0.1246(/float) 

(/atom) 

(atomid=“a3”) 

(string builtin= “elementType”)0(/str ing) 

(float builtin=“x2”)1.0623(/float) 

(float builtin=“y2”)—1.2937(/float) 

(/atom) 

(atomid=“a4”) 

(string builtin="elementType”)0(/string) 

(float builtin = “x2”)1.1008(/float) 

(float builtin=“y 2 ”)1.4332(/float) 

(/atom) 

(/atomArray) 

(bondArray) 

(bondid=“bl”) 

(string builtin=“atomRef’)a 1 (/string) 

(string builtin=“atomRef’)a2(/string) 

(string builtin=“ order” )l(/string) 

(/bond) 

(bond id=“b2”) 

(string builtin=“atomRef’)al(/string) 

(string builtin= “atomRef’)a3(/string) 

(string builtin=“ order”) 1 (/string) 

(/bond) 

(bond id=“b3”) 

(string builtin=“atomRef’)al(/string) 

(string builtin=“atomRef , )a4(/string) 

(string builtin=“order”)2(/string) 

(/bond) 

(/bondArray) 

(Imolecule) 


files, and they are difficult for chemists to read 
(although they are not usually meant for 
chemist interpretation). This is evident in Ta¬ 
ble 9.3, which shows the CML for acetic acid. 
By comparison, the SMILES for acetic acid is 
simply “CC(==0)0”. 

2.3 Chemical Structure File Conversion 

Many chemical information management sys¬ 
tems, especially modeling programs, permit a 


chemist to import and export structures using 
a variety of file formats. Commercial programs 
designed specifically for file conversion are 
available (57). A widely used public domain 
program, Babel, is available in source code and 
in a Windows version (5 8). It is being extended 
by the “OpenBabel” programming project 
(59). It is possible, with a fair amount of accu¬ 
racy, to convert a chemical structure from a 
connection table format to an acceptable 
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Figure 9.8. Structure representation with addi¬ 
tional information, including atom (partial charge) 
and fragment (percent composition) data and 
Markush structure features. 


IUPAC name (60). The reverse conversion of 
names to 2D structures is also possible (61). 

2.4 Representing Nonstructural Chemical 
Data 

Nonstructural chemical data includes any tex¬ 
tual, numeric, or binary data that is not di¬ 
rectly a part of the chemical structure. It in¬ 
cludes the following: 

• Whole-molecule data such as physicochemi¬ 
cal properties, spectral data, literature cita¬ 
tions, availability, biological or therapeutic 
activity, etc. In either the molecule file or in 
a database, data are typically maintained in 
fields that are separate from the structure 
field, but linked by some identifier. 

• Atom, bond, or fragment-based data such as 
partial charge, component fraction, second¬ 
ary structure, various fragment-based phys¬ 
icochemical and QSAR properties, etc. Be¬ 
cause these data are linked to particular 
atoms, bonds, fragments, or components in 
the structure, they are typically stored along 
with some indication of the substructural 
features with which they are associated. 

Figure 9.8 shows a structure that contains 
atom (partial charge) and fragment (compo¬ 
nent percent) data of various types, as well as 
Markush structure features (R^. Table 9.4 
shows the corresponding structure-data file 
(MDLsdfile). Each Rgroup member appears in 
its own "submolfile" in this file representa¬ 


tion. So-called Sgroup data appears as part of 
the molfile, along with the name of the data 
field, its value, location on the display, and the 
atoms and bonds that bear the data. 

3 STORING AND SEARCHING CHEMICAL 
STRUCTURES AND REACTIONS 

In the simplest sense, searching chemical in¬ 
formation consists of (ljfinding structures or 
reactions that meet the chemist's search crite¬ 
ria and/or (2) finding data that meets the 
search criteria. Data searching (numbers and 
text) is a well-established informatics activity, 
supported by spreadsheets, word processors, 
and relational database systems. Chemical 
structures and reactions are a unique form of 
data. Searching for full or partial matches to 
structures, models, and reactions requires 
highly specialized databases and search tech¬ 
niques. 

3.1 Storing Chemical Information 
in Databases 

When they were first developed, chemical 
structure databases consisted of record-ori¬ 
ented flat files, much like a spreadsheet whose 
columns have each been cut out and placed in 
a separate file. This organization has limita¬ 
tions in searching, access, and efficiency of, 
storage. Also, it is not the most appropriate 
form for storing generic structures and reac¬ 
tions, which are more hierarchical in nature. 
As a result, since the 1990s, chemical informa¬ 
tion has become increasingly stored in com¬ 
mercial relational database systems, chiefly 
Oracle and Microsoft Access. Relational stor¬ 
age has the added advantage of combining 
chemical structure storage with biological 
data and inventory data (location, cost, units 
on hand, etc.) that are often stored in the cor¬ 
poration's relational databases. One of the 
first reports of the relational storage of struc¬ 
tures was by Hagadone and Lajiness, who 
modified the Upjohn COUSIN system (62). 

An example of a current commercial rela¬ 
tional chemical database is seen in Fig. 9.9, 
which shows the organization of a basic ISIS 
chemical database. Each labeled item in the 
figure is a table in an Oracle relational data¬ 
base. The tables in the database consist of the 
following. 


Table 9.4 Example Molfile Showing: Markush Features, Atom, and Fragment Data 



Rl= Br 


$MDL REV 1 29AUG0117:47 

$MOL 

$HDR 

Figure 8 molfile 

MACCS-II08290117472D 1 0.00487 0.00000 0 

$END HDR 
$CTAB 

1 9 1 9 0 0 0 0 25 V2000 


-5.3301 

1.0237 

0.0000 

c 

0 

0 

0 

0 

0 

0 

-5.3323 

-0.5232 

0.0000 

c 

0 

0 

0 

0 

0 

0 

-3.9956 

-1.2952 

0.0000 

c 

0 

0 

0 

0 

0 

0 

-2.6560 

-0.5224 

0.0000 

c 

0 

0 

0 

0 

0 

0 

-2.6615 

1.0307 

0.0000 

c 

0 

0 

0 

0 

0 

0 

-3.9990 

1.7956 

0.0000 

c 

0 

0 

0 

0 

0 

0 

-3.9881 

3.2993 

0.0000 

c 

0 

0 

0 

0 

0 

0 

-3.9960 

-2.8378 

0.0000 

R# 

0 

0 

0 

0 

0 

0 

-2.6701 

4.1134 

0.0000 

c 

0 

0 

0 

0 

0 

0 

-1.3283 

1.8070 

0.0000 

c 

0 

0 

0 

0 

0 

0 

0.8559 

0.9069 

0.0000 

c 

0 

0 

0 

0 

0 

0 

0.8538 

-0.6400 

0.0000 

c 

0 

0 

0 

0 

0 

% 

0 

2.1904 

-1.4121 

0.0000 

c 

0 

0 

0 

0 

0 

0 

3.5299 

-0.6393 

0.0000 

c 

0 

0 

0 

0 

0 

0 

3.5247 

0.9138 

0.0000 

c 

0 

0 

0 

0 

0 

0 

2.1870 

1.6787 

0.0000 

c 

0 

0 

0 

0 

0 

0 

2.1823 

3.2214 

0.0000 

c 

0 

0 

0 

0 

0 

0 

3.5161 

3.9966 

0.0000 

c 

0 

0 

0 

0 

0 

0 

2.1900 

-2.9547 

0.0000 

R# 

0 

0 

0 

0 

0 

0 


8 3 1 0 0 0 

2 3 1 0 0 0 

7 9 2 0 0 0 

5 10 10 0 0 

3 4 2 0 0 0 

1 1 1 2 2 0 0 0 

4 5 1 0 0 0 

12 13 10 0 0 
1 3 1 4 2 0 0 0 

5 6 2 0 0 0 

14 15 10 0 0 

6 110 0 0 

1 5 1 6 2 0 0 0 
16 1110 0 0 

1 2 2 0 0 0 

16 17 10 0 0 

6 7 1 0 0 0 

1 7 1 8 2 0 0 0 

13 19 10 0 0 
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Table 9.4 (Continued) 


M 

RGP 

2 

8 1 19 

1 







M 

STY 

6 

1 GEN 2 GEN 

3 DAT 

4 

DAT 5 

DAT 6 

DAT 


M 

SLB 

6 

112 

2 

3 3 

4 

4 5 

5 6 

6 


M 

SAL 

1 

9 11 12 

13 

14 15 

16 

17 18 

19 



M 

SDI 

1 

4 0.0515 


-3.7303 


0.0515 

4.7961 



M 

SDI 

1 

4 4.8009 


4.7961 


4.8009 

-3.7303 



M 

SAL 

2 

10 1 2 

3 

4 5 

6 

7 8 

9 10 



M 

SDI 

2 

4 -6.1377 


-3.6181 


-6.1377 

4.9083 



M 

SDI 

2 

4 -0.5469 


4.9083 


-0.5469 

-3.6181 



M 

SAL 

3 

1 7 








M 

SDT 

3 

PCHARGE 




F 



MQ 

M 

SDD 

3 

-5.4645 


3.7303 

DA 

ALL 

1 

5 


M 

SED 

3 

0.12 








M 

SDT 

4 

PCT 




F 



MQ 

M 

SDD 

4 

-0.9021 


-4.9083 

DA 

ALL 

1 

5 


M 

SED 

4 

60% 








M 

SDT 

5 

PCT 




F 



MQ 

M 

SDD 

5 

5.2122 


-4.7961 

DA 

ALL 

1 

5 


M 

SED 

5 

40% 








M 

SAL 

6 

1 17 








M 

SDT 

6 

PCHARGE 




F 



MQ 

M 

SDD 

6 

0.5002 


3.9173 

DA 

ALL 

1 

5 


M 

SED 

6 

0.05 








M 

SPL 

2 

4 2 5 

1 







M 

END 










$END CTAB 











$RGP 


1 


$CTAB 










2 10 0 

0 0 



2 V2000 






-3.7453 


0.0472 

0.0000 

C 0 

0 

0 

0 

0 

0 

-2.5668 
12 10 

0 0 

1.0385 

0.0000 

Cl 0 

0 

0 

0 

0 

0 

M APO 1 

M END 
SEND CTAB 

$CTAB 

1 1 









3 2 0 0 

0 0 



2 V2000 






-5.0122 


0.6013 

0.0000 

C 0 

0 

0 

0 

0 

0 

-3.5311 


1.0229 

0.0000 

C 0 

0 

0 

0 

0 

0 

-2.6113 


—0.2666 

0.0000 

Br 0 

0 

0 

0 

0 

0 


12 10 0 0 
3 2 1 0 0 0 
M APO 111 
M END 
$END CTAB 
SEND RGP 
SEND MOL 


• The master "data dictionary" table, which 
describes all the objects in the database, as 
well as some parameters that are specific to 
the database (exact match criteria, version 
of the database, etc.). This is sometimes re¬ 
ferred to as "metadata" or "data about data." 


• A handful of tables that contain database 
parameters. These include substructure 
search key definitions, the periodic table 
used with the database, and a list of salt 
moieties that can be considered during 
searches. 
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Structure and data storage is shown on the 
right. A structure table contains the struc¬ 
tures, their internal identifiers, and their 
external identifiers, if any. The structures 
are stored in a compact binary representa¬ 
tion that includes the connection table, the 
coordinates, the ring information, and any 
stereochemical, valence, isomer, isotope, or 
bond information. Certain types of struc¬ 
ture-specific information such as polymer or 
component designations are stored here, 
whereas other types of structure-specific in¬ 
formation (atom- or bond-specific data, and 
more verbose text data) are stored in their 
own tables, referenced by the internal iden¬ 
tifier, and the atom or bond numbers to 
which the data correspond. A formula table 
contains the molecular formula and various 
atom and atom-type indexes to enhance for¬ 
mula searching and sorting. 

o A table of substructure keys containing a 
binary or text string of the substructure fin¬ 
gerprint that was identified in the given 
structure at registration time. These keys 
represent the presence of either simple 
functional groups (e.g., phenyl ring, car¬ 


bonyl), or more complex atom/bond combi¬ 
nations (e.g., carbonyl separated from a sec¬ 
ondary amine by three bonds). In ISIS, a set 
of 166 searchable keys can be explicitly used 
as filters for structure searching. A larger 
set of 960 keys is used for similarity calcula¬ 
tions. For 3D models, it is common to gener¬ 
ate 3D pharmacophore keys, which encode 
all the possible 2- and 3-point pharmaco¬ 
phores represented in the structure, some¬ 
times considering multiple conformations. 

o A third kind of information includes indexes 
to enable structure and substructure 
searching. A "flexmatch" table contains a 
numerical hash (see Glossary) of certain fea¬ 
tures of the molecule, including stereochem¬ 
istry, charge, and isotopes. This table can be 
used to retrieve a set of candidate structures 
quickly for exact match verification (63). It 
can also be used for “fuzzy” exact-match 
searches to retrieve tautomers and isomers 
of the input structure. 

• Another index table contains a "fastsearch" 
index. This contains a single balanced tree 
(see Glossary) of all the substructural frag- 
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Figure 9.10. Simplified ISIS Fastsearch index—ethanol is a leaf node that can be reached from 
several substructure nodes. 


ments found in structures in the database, 
up to a fixed pathlength. These are stored in 
a highly compressed binary format (Fig. 
9.10). Similar approaches have appeared in 
the literature (64). Leaf nodes in the tree 
contain identifiers of specific structures in 
the database (simplifiedin Fig. 9.10). An ex¬ 
act match or substructure search consists of 
traversing the tree to find structures in the 
database that have substructural fragments 
in common with the query structure. Be¬ 
cause the fastsearch index is large — often as 
large as the rest of the structure database, 
updating it for the addition or removal of 
structures is time consuming. 

This relational chemical database format is 
extended in ISIS to include 3D models, generic 
structures, and most recently, reactions. In 
these cases, additional "trees" in the database 
hierarchy connect 2D structures with 3D mod¬ 
els, connect root structures with correspond¬ 
ing Rgroup members, or connect molecules 
with reactions. 

Other relational structure/reaction data¬ 
base systems are available commercially. 
These include the Thor system from Daylight 

(65) , Accord and RS 3 Discovery from Accelrys 

(66) , and Unity from Tripos (67).Personal da¬ 
tabase systems that can be implemented on a 
desktop computer include ISIS/Base (68), Ac¬ 
cord for Access (66), and Team Works from 
Afferent (69). 


3.2 Registering Chemical Information 

Chemical structure registration is an impor¬ 
tant activity that is necessary for drug discov¬ 
ery. The structures that have been developed 
by a pharmaceutical company constitute the 
"crown jewels" of chemical information, and 
they must be properly and securely archived. 
The registration process usually involves the 
process of extracting, cleaning, transforming, 
and loading the data—sometimes termed 
ECTL. 

3.2.1 Extract the Data. First, the struc¬ 
tures/reactions and corresponding data are ex¬ 
tracted, collected, and validated. Increasingly, 
this is managed automatically, using output 
from the high-throughput chemistry process. 
Laboratory information management systems 
(LIMS) that are "structure smart" can man¬ 
age chemical structure information starting 
from the design of a reaction, through the syn¬ 
thesis of the compounds, the chemical analysis 
of the structures, the in vitro biological assay, 
and finally the storage in the chemical data¬ 
base. Certain steps, such as drawing the initial 
structures/reactions, still remain an activity 
for the chemist, although many chemical in¬ 
formation systems can take a generic struc¬ 
ture, enumerate the many specific combina¬ 
tions, and layout the structures automatically 
(for example, the Monomer Toolkit by Day- 
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light, the Central Library program by MDL, 
and CombiLibMaker by Tripos). 

3.2.2 Cleaning and Transforming the Data. 

Next, the structures/reactions are passed 
through a filtering program that searches for 
structure anomalies and corrects the chemical 
representation. In this step, chemical "busi¬ 
ness rules" are applied to the structures to 
insure that representations that can be drawn 
in different ways, such as nitro groups and 
tautomers, are represented by a single conven¬ 
tion. Specialized chemical manipulation lan¬ 
guages such as and Genie Control Language 
by Daylight, Cheshire by MDL, and Sybyl Pro¬ 
gramming Language by Tripos are used to im¬ 
plement this step. These languages are versa¬ 
tile and easily programmed, and they can be 
applied to other steps in the drug discovery 
process, such as searching, property calcula¬ 
tion, and structure manipulation in general. 

3.2.3 Loading the Data. Finally the struc- 
tureslreactions are handed to a chemical reg¬ 
istration system. The chemical registration 
system will typically "perceive" the struc¬ 
tures—identify atoms, bonds, rings, stereo¬ 
chemistry, valence states, isotope values, and 
other chemical information as needed. In the 
case of reactions, it notes which structures are 
reactants, which are products, and which are 
agents or catalysts. Because there can be 
many valid ways of drawing a structure de¬ 
pending on which atom you start with, a struc¬ 
ture may be given a canonical renumbering of 
the atoms using a variant of the Morgan algo¬ 
rithm (70). In the case of a linear representa¬ 
tion like SMILES, this canonicalization yields 
a unique string for the structure, which can be 
generated from any valid SMILES string de¬ 
rived from the structure. In the case of a struc¬ 
ture stored in a connection table, the Morgan 
algorithm results in the atoms being reor¬ 
dered in the connection table to generate a 
tree, branching outwards from the most 
highly connected atom in the structure. Be¬ 
cause of the efficiency of indexing in modern 
relational chemical databases, Morgan re¬ 
numbering is not used as much today as in the 
past. 

The registration system then computes in¬ 
dexes. These include structure-searching in¬ 


dexes, substructure or similarity keys, molec¬ 
ular formula, molecular weight, and other 
structure-based properties. Substructure keys 
or "fingerprints" are particularly important. 
They consist of a number of binary descriptors 
for the presence of certain functional groups 
or more generalized atom/bond combinations. 
These keys can be used to filter structures 
before searching. They are also used for simi¬ 
larity calculations. Originally, substructure 
search keys were always used to filter struc¬ 
tures before performing a substructure search 
of the database. If a query structure contains, 
say, a carbonyl group, then only carbonyl-con¬ 
taining structures should be examined during 
the substructure search. A key representing 
the carbonyl group can be used to filter struc¬ 
tures that contain the group (the key turned 
on, or set to l)from those lacking it (keyset to 
0). Tree-based substructure searching does 
not require prior filtering, so today, substruc¬ 
ture keys are primarily used for similarity cal¬ 
culations between molecules. If the key values 
of two structures are compared, the more keys 
they have in common, the higher their similar¬ 
ity value will be. When registering reactions, 
the reactants and products may undergo auto¬ 
mated or semi-automated perception of react¬ 
ing bonds and atom centers (71). Generic 
structures may be analyzed and "clipped" or „ 
reverse-transformed to generate root and 
member structures, which may be stored sep¬ 
arately (72). 

Before finally storing the structure in the 
database, the registration program may 
search the database for some level of match to 
the input structure or reaction, and skip the 
registration if it is a duplicate. This is some¬ 
times termed "deduplication" through "exact 
match" searching. There is usually some re¬ 
dundancy in chemical databases, and to save 
search time and disk space, most companies do 
not store duplicate structures or reactions, but 
rather store pointers to them. 

The final step, after registering the struc¬ 
ture or reaction, is to assign it a unique 
registry identifier, which is typically used 
throughout the company to identify the given 
structure/reaction and any chemical, biologi¬ 
cal, or inventory data that is associated with it. 
Some identifiers, like the Chemical Abstracts 
Service CAS number and the Beilstein BRN, 
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have wide application, and these may be used 
in addition to, or instead of, a corporate-as¬ 
signed external registry identifier. 

3.3 Searching Chemical Structures 
and Reactions 

The type of chemical structure and reaction 
searching that a chemist does usually depends 
on the current stage of a project. For example, 
if the chemist is starting a new therapeutic 
project, a therapeutic activity search might be 
conducted, using a database such as the Der¬ 
went World Drug Index, the MDL Drug Data 
Report, or the MDL Comprehensive Medicinal 
Chemistry database. Retrieving many search 
hits, the chemist might organize them by sort¬ 
ing on name, molecular weight, ring system, 
or some topological basis. If the resulting list is 
too large, the chemist might perform a cluster 
analysis of the structures to see what general 
classes of compounds have been synthesized in 
the past. After sampling from the various clus¬ 
ters, and identifying a handful of interesting 
structures, the chemist might perform a sub¬ 
structure search to find structures that con¬ 
tain the features that are felt to be important 
to activity (i.e., the pharmacophore). If that 
search returns too many hits, the search query 
can be refined by making it more specific. If 
the search returns too few hits, the search 
query can be relaxed, or a similarity search 
can be used to find structures in the topologi¬ 
cal neighborhood of the query structure. 
Eventually, a number of structures will be ob¬ 
tained as candidates for synthesis and/or test¬ 
ing. 

The next step is to design a set of reactions 
to synthesize the compounds. One or more re¬ 
action databases can be searched to find 
whether any reactions give the desired struc¬ 
tures as products or give structures that are 
similar to the desired ones. The chemist may 
also use reaction similarity searching (73) and 
searching across reaction schemes (e.g., if A + 
B C + D and C + E—»F + G;a reaction 
scheme search will find the query A ^ F) (74). 
Once a reaction is found, the chemist needs to 
decide what reagents to use in the synthesis 
and where to obtain them. The selection of 
reagents will usually be based on a combina¬ 
tion of physicochemical property consider¬ 
ations (i.e., QSAR and diversity), tempered by 


the chemist's experience and preferences, and 
balanced by synthetic feasibility and econom¬ 
ics. The reagents may be located in-house, or 
they may require ordering from a chemical 
supplier. 

A completely separate approach to reaction 
discovery is the reaction planning approach 
implemented in such programs as Logic and 
Heuristics Applied to Synthetic Analysis 
(LHASA) (75). This program works by search¬ 
ing a chemical knowledge base that contains 
information on approximately 2300 retro-re- 
actions or transforms. The chemist draws a 
target molecule and indicates a strategy for 
the reverse-synthetic analysis. The program 
then searches the transform knowledge base 
for transforms that satisfy the strategy the 
chemist selected. The program decides which 
transforms are suitable for the particular tar¬ 
get structure and displays the resulting pre¬ 
cursors to the chemist. The chemist can then 
select a precursor for further analysis and 
choose another strategy option, on which the 
program returns a second level of precursors 
in the same way. Processing continues in this 
manner until the chemist is satisfied that one 
or more of the precursors correspond to a rea¬ 
sonable starting point for a synthesis. Ret- 
rosynthetic methods have not become as 
widely used in industry as reaction searching, 
partly because the certainty of the reactions is 
not guaranteed. Also, searching existing reac¬ 
tion databases generally yields the desired re¬ 
action or something close to it. Indeed, a major 
problem with search results from reaction da¬ 
tabases is often an overabundance of hits, 
which typically need further organization and 
filtering to be useful. One approach to organiz¬ 
ing the results of reaction searching is to apply 
some clustering or classification to the reac¬ 
tions (76). 

To support the workflow just described, a 
number of structure and reaction search types 
have come into use (Fig. 9.11). These are 
briefly described as follows. 

3.3.1 Exact Match Searching. Here, the 
chemist has a particular structure (or reac¬ 
tion) that he wishes to find in the database. 
The structure/reaction is drawn using a draw¬ 
ing program and then passed to a search pro¬ 
gram. The program submits the query to the 
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Figure 9.11. Search types depend on the nature of the chemical information. 



search routine that typically generates index 
values from the query that are of the same 
type as those generated for structures/reac¬ 
tions when stored in the database. The index 
values are then used as filters to retrieve a set 
of candidate structures/reactions. In ISIS, 
these filters include the formula, the molecu¬ 
lar weight, and the flexmatchindex, a numeric 
hash code based on the presence of isomers, 
tautomers, isotopes, salts, charges, and stere¬ 
ochemistry (see Glossary). The resulting fil¬ 
tered structures have the minimum set of re¬ 
quirements to fit the search query, but 
typically only a fraction of these structures 
will fit the query exactly. Once this set of can¬ 
didate structures is obtained, the query struc¬ 


Figure 9.12. Different degrees of exactness 
can be defined by allowingtautomers, salts, and 
isomers successively in the search. 


ture is mapped to the candidate structure us¬ 
ing a process known as atom-atom mapping, 
which is known in topology as the "graph iso¬ 
morphism" problem. This mapping is time- 
consuming, so the prior filtering step should 
be as efficient as possible. Each structure that 
maps exactly to the query is placed in the re¬ 
sult set or "hit list." To accommodate various 
chemists' needs, exact match searching can 
usually be "relaxed" to permit the finding of 
isomers, tautomers, salts, charged or un¬ 
charged species, etc. In the case of reactions, 
variations of the reaction can be retrieved-by 
relaxing the constraints on the reaction condi¬ 
tions, solvent, and catalyst (Fig. 9.12). 

In a Daylight Thor database, where the 
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Figure 9.13. Example 2D substructure search 
queries with various atom and bond query 
features. The more features that are present, 
the more flexible the search becomes, but the 
search may also require more time to com¬ 
plete. There is a trade-off between putting the 
flexibility into the database (i.e. f storing and 
indexing multiple forms of a structure) and 
putting the flexibility into the search query 
and the search software. 


structure is stored as unique SMILES, the ca¬ 
nonical query SMILES can be compared lexi¬ 
cally with strings in the database using fast 
string comparison and indexing techniques, to 
find exact match structures and reactions. Be¬ 
cause a structure in a Thor database consists 
cf a meaningful, canonical sequence of charac¬ 
ters, the computational efficiencies of string 
searching and comparison can be applied 
when searching the database. This is in con¬ 
trast to the highly specialized search tech¬ 
niques used in other structure database for¬ 
mats. 

3.3.2 Substructure Searching. A substruc¬ 
ture search is performed when a chemist has 
in mind a pharmacophore consisting of a set of 
functional groups or a substructure which he 
knows must be present in the structures to be 
retrieved. Only part of the molecule is drawn, 
along with query features that generalize at¬ 
oms, bonds, and rings in the structure. Figure 
9.13 shows some typical substructure query 
features. The features include the following: 

o Single atom—specifies a periodic table atom 
that must be present or a more generalized 
atom (hetero, metal, etc.) or "superatom" 
(condensedfunctional group, such as Ph, Et, 
Ala, etc.) 

o Atom list — a list of atoms, any one of which 
may be present 

o “Any” atom—which simply means some 
atom must be attached at the given position. 
As with structures, the hydrogen atoms in 
substructures are implicit, unless the user 


requests a particular hydrogen count or 
range at a given position 
o Link node—which specifies a range of al¬ 
lowed atom or functional group links be¬ 
tween atoms 

o Stereo bond—including Z/E/either or up/ 
down/either 

o Markush feature — used for patent repre¬ 
sentation, for representation of generic 
structures for combinatorial chemistry, or 
to limit the substituents that can be present 
at a given position. Note that some systems 
allow logical operations on Markush fea-. 
tures (if —OH at R„ then no —Cl at R 2 ). 

A specialized case of substructure search¬ 
ing is 3D pharmacophore searching, in which a 
substructure search is combined with the 
measurement/generation of 3D features, to 
identify models that could fit a 3D pharma¬ 
cophore. Figure 9.14 shows an example of a 3D 
substructure search query that includes vari¬ 
ous 3D features or constraints. A given confor¬ 
mation of a molecule that is stored in the da¬ 
tabase may not exactly match a given query, 
but it could be modified by rotation about sin¬ 
gle bonds to fit the query. For this reason, con- 
formationally flexible 3D searching is a fea¬ 
ture of most 3D database systems (77). When 
searching conformers, the conformational 
flexibility can be incorporated into (l)the 
query, by tethering flexible groups to fixed an¬ 
chor points in the structure, (2)the database, 
by storing multiple low energy conformations 
for each structure, or (3)the search process,by 
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Figure 9.14. (a) Example 3D pharmacophore 
search query, showing substructure, distance, 
angle, and exclusion sphere constraints, (b) 
Example result of a conformationally flexible 
3D search using this query. The molecule was 
"flexed" in 3D by rotating about the high¬ 
lighted single bonds to lit the query. The at¬ 
oms, bonds, and 3D features that match the 
query are colored. One problem with confor¬ 
mationally flexible 3D searching is that un¬ 
wanted hits can be conformed to fit the query. 

incorporating a rapid conformational analysis 
into the 3D search algorithm. The last is the 
most common approach and is a part of data¬ 
base systems from Tripos, Accelrys, and MDL. 

Many different approaches to substructure 
searching have been devised (78). In ISIS, the 
fastsearch index file is used to retrieve candi¬ 
date structures. If needed, the query is then 
mapped onto these structures using a "back¬ 
tracking" approach. This involves succes¬ 
sively matching atoms and bonds in the struc¬ 
ture to those in the query in a stepwise 
manner. When a match fails at any given step, 
the program backtracks to the last successful 
step and selects an alternative atom or bond. 
Once all the atoms and bonds have been 
matched, the structure is considered a hit. An 
issue of the Journal of Chemical Information 
and Computer Sciences has been devoted to 
substructure search methods (79). Hicks and 
Jochum reported a comparison of several sub¬ 
structure search algorithms in 1990 (80). 
These authors found the Beilstein-SoftronS4 
search system to be superior in search speed at 
that time. 


3.3.3 Similarity Searching. The most gen¬ 
eralized type of structure/reaction searching is 
searchingfor "similar" structures or reactiong 
in the database. Chemical similarity has been 
a highly debated topic for some time, mostly 
from the standpoint of what constitutes good 
descriptors to use in the similarity calcula¬ 
tions (81). Nevertheless, there are some gen¬ 
eral approaches that are widely used, not be¬ 
cause of their theoretical soundness, but 
simply because they work for the chemist. For 
2D structures, the most useful and efficient 
similarity approach is key-based similarity. 
This involves computing the overlap between 
a query structure and a candidate structure 
using substructure or fragment keys. ISIS 
uses the 960 keys that are generated when the 
structure is registered. The overlapis typically 
computed using the Tanimoto metric, which 
was first used in 2D structure similarity by 
Willett et al. (82). Depending on the nature 
and number of the keys, it may be desirable to 
weight the Tanimoto calculation inversely ac¬ 
cording to the prevalence of the key in the 
database. Thus, a cyclopropyl key, which may 
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not be highly prevalent in the database, and 
would be "swamped" by other, less relevant 
keys in an unweighted similarity calculation, 
may have more influence in a weighted calcu¬ 
lation. This weighted calculation is used as the 
default in ISIS chemical databases. It is possi¬ 
ble for an ISIS database administrator to re¬ 
generate the keys using custom values of the 
weights to enhance differences in the similar¬ 
ity calculations and select, say, more "drug¬ 
like" molecules in the search. In the reaction 
domain, similarity can be defined in terms of 
the structures, the reactions, or a combination 
cf the two (83). Other similarity search sys¬ 
tems have been described in the literature, in¬ 
cluding the one used by CAS (84). It is also 
possible to use 3D pharmacophore keys to 
compute similarity, although these have typi¬ 
cally not performed as well as 2D keys. It is 
possible that conformational flexibility so 
vastly expands the "chemical space" of the 
molecules that a limited number of keys is 
simply inadequate for 3D similarity calcula¬ 
tion. When attempting to predict the type of 
therapeutic activity a compound has, Briem 
and Lessel concluded that 2D and 3D keys 
have complementary information (85). 

3.3.4 Reaction Searching. Reaction search¬ 
ing, sometimes called reaction indexing, has 
been available for over 20 years. Originally de¬ 
veloped as online searching systems, the intro¬ 
duction of in-house systems like REACCS al¬ 
lowed pharmaceutical companies to augment 
published reaction sources with their own re¬ 
actions and data (86). As with molecules, reac¬ 
tion storage has moved from proprietary data¬ 
base foundations to storage and access in 
relational systems. Reaction searching encom¬ 
passes many of the same types of searches 
used for molecules. A reaction typically con¬ 
sists of three types of structures: reactants, 
products, and catalysts or agents, along with 
textual information about yield, conditions, 
etc. Reactant and product structures undergo 
structural changes in the reaction, whereas 
agents do not. The atom and bond changes 
that occur in a reaction are isolated in one or 
more reacting centers of the reactants and 
products. The atom changes consist of 
changes in atom valence, charge, number of 
attached hydrogens, number of bonds at¬ 


tached, retention or inversion of stereochem¬ 
istry, etc. Bond changes include making and 
breaking of bonds, and changes in bond order 
and stereochemistry. When searching reac¬ 
tions, the chemist can search for exact, isomer, 
or substructure matches in the reactants, in 
the products, or both. The structure searching 
can be accompanied by a search of the reaction 
text information for yield and conditions. Sev¬ 
eral commercial reaction indexing systems are 
available from molecule database vendors, 
and online searching is even possible (87). 

In most reactions, the majority of the at¬ 
oms and bonds are not involved in the reac¬ 
tion, and they remain unchanged between re¬ 
actants and products. To avoid examining 
these unchanging atoms and bonds, most re¬ 
action indexing systems allow the user to 
mark, in the reactants and products, those at¬ 
oms and bonds that are involved in the reac¬ 
tion. These are termed reacting center atoms 
and bonds, and when they are present, they 
enable much faster reaction searching and 
they reduce the number of false hits obtained. 
A simple example is seen in Fig. 9.15. Some 
systems have semiautomatic perception of re¬ 
acting centers, which must usually be aug¬ 
mented or checked by a chemist, especially 
with complex transformations. 

As with molecules, it is also possible to do 
reaction similarity searching. Given a reaction 
with reactants, products, and agents, one can 
typically run molecule similarity searches for 
the reactants, the products, or both. This will 
retrieve reactions that have similar structures 
involved in them. This does not guarantee 
that the molecules undergo the same or even 
similar transformations. It is possible in some 
systems to also include the similarity of the 
transformation as part of the overall similar¬ 
ity search. This is usually carried out using 
special keys that have been generated for a 
fixed number of possible transformations. As 
with molecules, the more keys a query and a 
reaction have in common, the higher will be 
the similarity. 

3.3.5 Searching Other Data. Data other 
than structures and reactions must also be 
searched in the drug discovery process. Vari¬ 
ous systems exist for indexing and searching 
literature and journal contents (88), patents 
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Figure 9.15. Reaction substructure search query and some example hits. If no reacting center or 
mapping information is used, all three hits are found. If reading bond information is used, hit c is 
excluded. If both reacting atom and reacting bond information is included, then false hits b and c are 
excluded. 


(89), material safety data sheets (90), and 
chemical suppliers (91). Some useful tools in¬ 
clude the Accord ChemExplorer program, 
which allows searching word processor docu¬ 
ments and files for particular chemical struc¬ 
tures, and the CambridgeSoft ChemFinder for 
Word (92). 

3.4 Chemical Information Management 
Systems and Databases 

A number of software and database vendors 
provide programs and database systems to im¬ 
plement representation, registration, and 
searching of chemical information in a corpo¬ 
rate environment. Some of these vendors have 
smaller personal chemical database systems 
that support registration and searching on a 
personal computer. A handful of academic and 
public domain systems are also available. Fi¬ 
nally, an increasing number of chemical infor¬ 
mation systems are being made available on 


the Internet. Some representative systems 
that are being sold or have been discussed re¬ 
cently in the literature are discussed below. 

3.5 Commercial Database Systems 
for Drug-Sized Molecules 

Accelrys. A subsidiary of Pharmacopeia, 
Inc, Accelrys was originally a provider of mo¬ 
lecular modeling software. They recently ac¬ 
quired several companies that provide offer¬ 
ings in the chemical information and 
bioinformatics areas. The company provides 
unique databases including several for reac¬ 
tions. 

• BioCatalysis—biomolecules as catalysts 

• BioSter—pairs of biologically similar struc¬ 
tures for bioisosterism applications 

• Biotransformations — developed in conjunc¬ 
tion with the Royal Society of Chemistry 
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• Failed Reactions—those that did not pro¬ 
ceed as expected 

• Metabolism—developed in conjunction 
with the Royal Society of Chemistry 

• Methods in Organic Synthesis—33,000 re¬ 
actions, Protecting Groups — functional 
group protection with region/stereoselectiv¬ 
ity 

• Solid Phase Synthesis — with emphasis on 
small-molecule and combinatorial chemis¬ 
try 

The chemical information programs pro¬ 
vided by Accelrys include several database sys¬ 
tems. 

• Accord for Excel and Access—relational 
chemical storage for Microsoft programs 

• Accord for Oracle—a chemical data car¬ 
tridge (see Glossary) 

• Accord Database Explorer—to access Accel¬ 
rys reaction databases 

• RS 3 Discovery System—with programs for 
chemical structure, data management, 
high-throughput screening, and inventory 

Accelrys also provides programs for de¬ 
scriptor calculation, QSAR, and data mining 

(93) . 

The Beilstein Database. The Beilstein Data¬ 
base, with over 8 million structures, is the old¬ 
est in existence, based on the Beilstein Hand¬ 
book of Organic Chemistry, and contains data 
that extend back to 1771. The database is pro¬ 
duced by the independent Beilstein Institute 

(94) . Access to the database is either through 
Beilstein Online, available through STN and 
Dialog, or through the Web using Crossfire 
Beilstein, which is marketed by MDL GmbH— 
formerly Beilstein Inc. (95). Data that are 
stored include the structure, Beilstein and 
CAS Registry Numbers, names, formula, 
preparations, reactions, natural product isola¬ 
tions, and chemical derivatives. Physical prop¬ 
erties, if available, are also stored, including 
optical data, mechanical properties, multi- 
component system data, spectral and thermo¬ 
dynamic properties, as well as biological func¬ 
tion, ecological data, toxicity, and common 
uses. Citation data, including author, journal, 


CODENs, and patent information, are also 
stored. The data are organized into substance, 
reaction, and citation contexts, and a user can 
easily switch from one context to the other. An 
ACS symposium volume devoted to the Beil¬ 
stein database has been published (96). 

Chemical Abstracts Service. As a division of 
the American Chemical Society, CAS develops 
and manages the world's largest databases of 
chemical structures and reactions. 

• CAS Registry—35 million structures— 19.5 
million distinct structures — 13 million bio¬ 
sequences 

• CASREACT —4 million reactions 

• CHEMCATS—2.5 million commercially 
available chemicals 

• MARPAT—500,000 searchable Markush 
structures 

The CAS databases are maintained online, 
with searching allowed on a subscription ba¬ 
sis. SciFinder is a client/server application to 
search CAS databases by author, keyword, ex¬ 
act, and substructure. It includes a "keep me 
posted" update feature, reaction information 
back to 1974, nucleotide and protein sequence 
searching, browsing of 1600 journals, and in¬ 
tegration of structure, data, and citation infor-„ 
mation. STN International is a collection of 
200 databases covering chemistry, life sci¬ 
ences, engineering, patents, etc. STN Express 
provides wizard-assisted searching, and STN 
on the Web serves as a web client for STN. The 
ChemPort program provides web access to 
journals (97). 

Daylight Chemical Information Systems, 
Inc. This company provides numerous third- 
party databases in the Thor format. These in¬ 
clude the following: 

• Databases of organic structures: Available 
Chemicals Directory— 250,000 structures, 
Asinex catalog— 115,000 structures, May- 
bridge catalog— 62,000 structures, Info- 
Chem SPRESF95 — 2.5 million structures 

• Drug and biological databases: BioScreen 
NP and SC—about 52,000 structures in¬ 
cluding natural products, Pomona College 
Medchem—36,000 structures with mea¬ 
sured LogP, National Cancer Institute — 
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120,000 structures with cancer/HIV screen¬ 
ing data, Derwent World Drug Index WDI — 
60,000 drugs. 

• Toxicity: Aquire—5300 EPA structures 
with aquatic toxicity, TSCA—100,000 EPA 
substances 

• Reactions: InfoChem ChemReact/Chem- 
Synth—390,000 reactions with 470,000 
structures, InfoChem SpresiReact—2.5 mil¬ 
lion reactions and 1.8 million structures 

Software and applications from Daylight 
include the following (98): 

• Numerous toolkits: SMILES, Depict, 
SMARTS, Fingerprint, Monomer, Thor, 
Merlin, X-Widgets, Program objects, Re¬ 
mote Access, and Reaction Toolkits (see 
Glossary) 

• Daylight chemistry cartridge for Oracle: 
DayCart (see Glossary) 

• Thor database manager—to build and man¬ 
age thesaurus-oriented databases 

• Merlin searching of structures and data 

• Clustering package, with Jarvis-Patrick 
type cluster analysis 

« Rubicon, a program for building 3D models 
using a distance geometry approach 

• PCModels for LogP and other physical prop¬ 
erty calculations 

• CombiChem Package to manage high- 
throughput synthesis 

• Reaction Package 

• DayCGI—a web development toolkit 

• Aset of Java tools for chemical information 
management 

Derwent Information. A division of Thom¬ 
son Scientific, Inc., Derwent is the leading 
supplier of value-added patent information. 
The Derwent databases, which are main¬ 
tained online, include the following: 

• Derwent World Patents Index—references 
to patents, including chemical structure and 
use patents 

• Patents Citations Index—bibliographic and 
citation data, the Innovations index com¬ 
bined entries from WPI and PCI 


• Derwent Selection database — customized 
subsets of the WPI 

The databases are available through sev¬ 
eral hosting services, including STN, Dialog, 
and Questel Orbit. User guides for the PCI 
chemical indexing are available online at Der¬ 
went (99). Chemical patents can also be 
searched using the Merged Markush Service, 
MicroPatent, and for Japanese patents, the 
Japanese Patent and Trademark Documents 
(ISTA) among others (100). 

The Gmelin Database. The most compre¬ 
hensive database of structures, properties, 
and citations in inorganic and organometallic 
chemistry is the Gmelin database, based on 
the Gmelin Handbook of Inorganic and Orga¬ 
nometallic Chemistry dating back to 1772. 
This database includes 1.4 million compounds 
including coordination compounds, alloys, 
solid solutions, glasses and ceramics, poly¬ 
mers, and minerals. As such, it is less valuable 
to drug discovery. The current Gmelin data¬ 
base is owned by the Gesellschaft Deutscher 
Chemiker and is licensed to MDL GmbH. 

MDL Information Systems, Inc. Owned by 
Elsevier Science Publishing, MDL is a long¬ 
time provider of in-house databases and soft¬ 
ware. Databases include the following: 

• Available Chemicals Directory ACD— 
300,000 structures — reagents and general 
chemicals, with supplier information 

• Bioactivity databases—AIDS database— 
43,000 structures and data from the Na¬ 
tional Cancer Institute, Comprehensive Me¬ 
dicinal Chemistry (CMC)—7500 common 
drug structures, MDL Drug Data Report 
(MDDR)—120,000 patented drug struc¬ 
tures 

• Reactions—Chemlnform—850,000 reac¬ 

tions and 1.2 million structures, Theilhei- 
mer/Chiras/Metalysis —171,000 reactions 

and 223,000 structures 

« Metabolism—Metabolite—53,000 transfor¬ 
mations—34,000 structures 

• Toxicity—EPA RTECS-based—150,000 

structures 

• Material safety—OHS Material Safety Data 
Sheets 
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Software from MDL includes the ISIS 
scientific information system (ISIS/Draw, 
ISIS/Base, and ISIS/Direct), Cheshire for 
chemical structure manipulation, and Chime 
and Chemscapefor Web access. Combinatorial 
and high-throughput chemistry programs in¬ 
clude Afferent, Central Library, Project Li¬ 
brary, Reagent Selector, and Elan. Biological 
data management programs include Apex and 
Assay Explorer; literature access through 
LitLink; reaction access through Reaction 
BrowserAVeb; and finally, molecular modeling 
through S culpt (101). 

Tripos , Inc. Originally the major provider 
cf molecular modeling software, Tripos now 
offers chemical information content in the 
form of databases and the tools to manage 
them. These include the following: 

• Several Chapman and Hall databases in¬ 
cluding ones for organic structures (180,000 
structures), inorganic and organometallic 
structures (40,000 structures), natural 
products (105,000 structures), and pharma¬ 
cological agents (22,000 structures) 

• The National Cancer Institute structures in 
a Tripos-compatible format 

• The Derwent World Drug Index (60,000 
structures) 

Chemical information software offered by 
Tripos now also extends beyond just molecu¬ 
lar modeling. Their programs include the fol¬ 
lowing: 

• The Unity 3D database system, which fea¬ 
tures rapid flexible 3D pharmacophore 
searching 

• Concord and Stereoplex—for generating 3D 
models of database structures including 
multiple stereochemical isomers 

• ChemEnlighten for chemical data mining 

• The AUSPYX structure data cartridge for 
Oracle 

• A suite of programs for combinatorial 
chemistry—Legion to build and store vir¬ 
tual libraries, CombiLibMaker to enumer¬ 
ate structures, Selector to define diversity 
measures to select diverse subsets of struc¬ 
tures, and DiverseSolutions to apply chemi¬ 


cal diversity techniques to chemical popula¬ 
tions to characterize and populate chemical 
space (102) 

3.6 Sequence and 3D Structure Databases 

Sequence databases of biological macromole¬ 
cules are useful when defining new therapeu¬ 
tic targets. Databases for DNA, RNA, and pro¬ 
teins are available from such sources as the 
National Center for Biotechnology Informa¬ 
tion (NCBI) (103) and the European Bioinfor¬ 
matics Institute (104). Numerous online pro¬ 
grams and tools are available to researchers to 
search and align sequences, generate phyloge¬ 
netic analyses (chemical evolutionary trees), 
map genes, and predict secondary structure 
(105). The Protein Data Bank stores the larg¬ 
est collection of crystallographic, NMR, and 
molecular-modeling derived protein and nu¬ 
cleic acid 3D models (106). The Cambridge 
Crystallographic Data Center is the primary 
source for crystal structure data on small mol¬ 
ecules, with more than 250,000 entries. The 
Cambridge Database can be searched using 
the programs ConQuest for searching, Mer¬ 
cury for structure visualization, and Vista for 
numerical display and statistical analysis 
(107). 

3.7 In-House Proprietary and Academic 
Database Systems 

Larger chemical and pharmaceutical firms 
have, over the years, developed in-house sys¬ 
tems with capabilities that are specific to the 
chemist's needs. Today, the costs of develop¬ 
ing from scratch and maintaining an in-house 
system are prohibitive, especially because 
commercial chemical information systems are 
highly efficient and customizable. Personal 
chemical information software is still being 
developed and reported in the literature. Ex¬ 
amples include a relational database pat¬ 
terned after the Upjohn Cousin system (108), 
and CheD, which is a SQL-based system with a 
Web client (109). 

Commercial personal database systems are 
available from several vendors, as described 
above. These products extend the productivity 
of an individual chemist or a small workgroup, 
but are not designed for corporate or enter¬ 
prise applications. Other personal chemical 
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database programs that are available include 

ChemFinder from CambridgeSoft, Chem- 
Folder from ACDLabs, ChemWindow from 
Softshell, and Aura-Mol from Cybula (110). 


4 CHEMICAL PROPERTY ESTIMATION 
SYSTEMS 

The design and screening of drug candidates is 
increasingly being conducted in silico. This is 
made possible by improvements in programs 
for property calculation and estimation. Here, 
the term property calculation refers to the 
generation of some topological (depending 
only on the 2D structure), topographical (de¬ 
pending on the 3D conformation), or physico¬ 
chemical property of a molecule—directly 
from the structure. The term property estima¬ 
tion refers to the generation of some property 
as a function of other properties — either 
through a regression equation, a formula, 
neural network calculation, or some other in¬ 
direct means. 

The distinction between calculation and es¬ 
timation is important because some proper¬ 
ties, like molecular weight, polar surface area, 
molecular connectivity values, counts of 
chemical functional groups, partial charges, 
and other quantum mechanical descriptors, 
can be calculated precisely and de novo from 
the structure alone. Most of these properties 
have some fixed definition or algorithm that 
enables their calculation to be performed un¬ 
ambiguously, with little or no error. What er¬ 
ror is present is usually systematic or deter¬ 
ministic. A second class of properties, 
including LogP and other additive-constitu¬ 
tive properties, may be calculated by fragment 
additivity with various correction terms. 
These properties differ from de novo proper¬ 
ties because they are approximations to the 
true (sometimes measured) values. Often, 
there are multiple approaches to their calcula¬ 
tion. The errors in the calculation of these 
properties are statistical or stochastic. A third 
class of properties includes those that can only 
be estimated from other properties, using a 
regression analysis, neural network, or other 
linear or nonlinear function of variables. The 
errors in these properties can be complex and 
difficult to determine. For all these reasons, it 


is important to carefully consider the use of 
any given property for drug discovery pur¬ 
poses. Too often, properties are calculated 
simply because they are available, then used 
in a QSAR analysis, and possibly applied to 
future predictions — all without proper consid¬ 
eration of their precision, accuracy, and rele¬ 
vance to the chemical problem. 

Given this caveat, it must be noted that 
there are a multitude of programs available 
for the calculation of properties of structures. 
Some programs compute only a single prop¬ 
erty, like LogP. Others calculate a series of 
values in a given genre of property, like molec¬ 
ular connectivity (111) or BCUT descriptors 
(112). Still others compute a vast range of 
properties that include topological, topo¬ 
graphical, and physicochemical descriptors 
alike. It is beyond the scope of this chapter to 
detail all the programs and vendors that pro¬ 
vide property calculation and estimation soft¬ 
ware. Many of the calculations are provided as 
part of molecular modeling and QSAR pro¬ 
gram systems. Some programs and vendors 
whose products are solely for property calcu¬ 
lation are described below. 

4.1 Topological Descriptors 

Descriptors based on the 2D structure or sim¬ 
ply on the connectivity matrix of a structure 
have long been used for chemical similarity 
and for property correlations. Because they of¬ 
ten lack any relationship to mechanism, these 
descriptors are best used within a congeneric 
series or at least a set of similar structures. 
They may be empirically useful for cluster 
analysis and chemical library design, because 
they are effective at representing structure 
differences and similarities. A few programs 
and providers of topological descriptors in¬ 
clude the following: 

• Barnard Chemical Information — provides 
chemical Fingerprint Generation Pack—to 
compute fragment-based fingerprints for 
cluster and diversity analysis (113) 

• DRAGON — implementation of about 1400 
descriptors of Todeschini and Consonni 
(114) including constitutional, topological, 
autocorrelation, geometrical and functional 
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groups, and including simple molar refrac- 
tivity, polar surface area, and Moriguchi 

• Molconnz-EduSoft LC—provides MOLCONNZ 
molecular connectivity and electrotopologi- 
cal state descriptors of Kier and Hall (115) 

4.2 Physicochemical Descriptors 

As a complement to topological descriptors, 
physicochemical descriptors often have a 
strong relationship to mechanism, and are 
widely used in lead optimization and QSAR. 
The classic triad — steric, electronic, and li¬ 
pophilic descriptors — are considered the foun¬ 
dation of QSAR, and adequate coverage of the 
space of these factors is still a major goal in 
drug discovery. The most common physico¬ 
chemical descriptor is LogP, the 1-octanol/wa- 
ter partition coefficient. Because it is so impor¬ 
tant, a number of programs and vendors 
provide LogP calculations based on a variety 
cf methods. Many of these programs also com¬ 
pute other physicochemical properties, such 
as pKa and solubility. 

• BioByte, Inc.—developers of CLOGP, pre¬ 
mier LogP, and molar refractivity calculator 
(116) 

a Syracuse Research Corporation — provide 
KOWWIN and 11 other structure-based 
property calculations (117) 

• CompuDrug Ltd.—the PALLAS System— 
including programs for for p K a , logP, logD 
predictions, metabolism and toxicity, and 
high pressure liquid chromatography 
(HPLC) development (118) 

a ACDLabs — physicochemical laboratory pro¬ 
gram calculates pKa, LogP, logD, aqueous 
solubility, boiling point and vapor pressure, 
Hammett electronic constants, and a vari¬ 
ety of liquid properties (119) 

• XLOGP—The Peking University LogP cal¬ 
culator—a similar version for proteins is 
available as PLOGP (120) 

• EduSoft LC—provider of Hint!-Lo@—to 
accompany the HINT! Hydropathic interac¬ 
tion modeling program (121) 

• SciVision—provider of software for chemi¬ 
cal property calculation, and to estimate 


QSAR, toxicology, oncology, and other bio¬ 
logical properties (122) 

• Sirius-Analytical—provider of instruments 
for LogP and p.Ka determination, and the 
Absolv program to predict physicochemical 
properties (123) 

Most of the commercial molecular model¬ 
ing systems also provide some property calcu¬ 
lations, which range from simply calculating 
the polar surface area of a structure to a full 
range of topological and physicochemical de¬ 
scriptors. These may be based on fragment ad¬ 
ditivity, like most of the programs mentioned 
above, or they may involve correlations with 
quantum mechanical or even molecular dy¬ 
namics-based calculations. 

4.3 Absorption, Distribution, Metabolism, 
and Excretion Properties 

Perhaps the most critical aspect of drug devel¬ 
opment—the behavior of the drug in vivo — is 
also one of the least predictable. Each year, 
many drug candidates reach the very expensive 
stage of clinical trials, only to be discontinued 
because of problems with absorption, distribu¬ 
tion, metabolism, or excretion (ADME). Toxic¬ 
ity is often added to this acronym (ADMET), 
because we increasingly find critical differ¬ 
ences in the way children respond versus 
adults, males versus females, etc. Among the 
hopes that accompany the deciphering of the 
human genome, is that drug selection can 
someday be tailored to an individual's geno¬ 
type, to lessen the possibility of untoward drug 
response. For the present, drug designers are 
focusing increased attention on the prediction 
of ADME properties, pharmacokinetics, and 
in vivo behavior. Compared with topological 
and physicochemical predictions, ADME cal¬ 
culations are still rather crude and approxi¬ 
mate. They are usually based on correlations 
with other properties. And, if the method for 
obtaining the correlation is a neural network, 
the predictions may be superior to simpler re¬ 
gression-based approaches, but the interpret- 
ability of the model is missing. Some of the 
programs that are used to predict ADME de¬ 
scriptors include the following: 
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• LION Bioscience—provider of iDEA, a mod¬ 
ular ADME predictive system. The absorp¬ 
tion module predicts Caco-2 cell perme¬ 
ation, and performs dose-response modeling 
of the oral absorption. The metabolism mod¬ 
ule predicts first-pass effects and models 
metabolic parameters. Future modules are 
planned for distribution and elimination 
(124). 

• PASS —prediction of biological activity spec¬ 
tra—compares a test structure with those in 
a database of about 45,000 structures with 
known activity/toxicity, using topological 
descriptors and probability calculations (125). 

4.4 Property Calculations Online 

Many of the providers of software and data¬ 
bases of chemical properties also provide on¬ 
line calculation services. These include Day¬ 
light Chemical Information (126), ACD labs 
(127), and Syracuse Research (128). In addi¬ 
tion, the following sites provide online calcu¬ 
lations of a variety of properties: 

• Molinspiration—calculates LogP, polar sur¬ 
face area, Lipinski Rule-of-5, and a drug- 
likeness index (129) 

• Alogp—VCCLab online LogP calculation 
(130) 

• PETRA—The University of Erlangen prop¬ 
erty calculation routines (131) 

a USEPA Suite—including implementations 
of the Syracuse Research software (132) 

5 DATA WAREHOUSES 
AND DATA MARTS 

Even relational databases have their limita¬ 
tions when dealing with huge amounts of data 
and high user traffic. The burden of continual 
data updating and registration — activities 
known as OnLine Transaction Processing 
(OLTP), can considerably slow down search¬ 
ing and report generation activity—known as 
OnLine Analytical Processing (OLAP). For 
this reason, it is becoming common in the da¬ 
tabase field to build special large databases de¬ 
signed primarily for searching purposes— 
so-called data warehouses (133). These ware¬ 
houses have pre-computed indexes and tables 


that facilitate repeated searching. A special 
database architecture, known as the star 
schema, facilitates OLAP activity. In this de¬ 
sign, one or more large fact tables contain 
records of frequently searched data for each 
object (e.g., structure or reaction) in the data¬ 
base. The fact table is joined to smaller dimen¬ 
sion tables that contain the relational infor¬ 
mation. The schema is known as a star schema 
because the architecture resembles a many- 
pointed star, with the fact table at the center, 
and dimension tables at the ends of the arms. 
The design of the fact and dimension tables in 
the warehouse should reflect the searching 
habits of the users to get the best perfor¬ 
mance. Probably the first mention of data 
warehousing in the pharmaceutical area was 
that of Axel and Song in 1997 (134). 

5.1 Data Warehouses of Chemical 
Information 

A data warehouse is designed to consolidate 
structures and data from many diverse 
sources, including relational databases, flat 
databases, and structure and data files. It is 
considered to be multidimensional. A true 
chemical data warehouse might contain se¬ 
quences, 2D structures, 3D models, Markush 
structures, and reactions — all in the same da¬ 
tabase. No such commercial database cur¬ 
rently exists, but databases presently being 
developed at MDL and other vendors are ex¬ 
amples of chemical data warehouses of struc¬ 
tures and their reactions. The MDL data ware¬ 
house framework is termed the concordance. 
The fact table for the concordanceis the source 
table, which brings together structure and re¬ 
action identifiers from all the various data 
sources and links them to the unique struc¬ 
tures in the warehouse (Fig. 9.16). Using the 
concordance, a substructure search can re¬ 
trieve a set of unique, unduplicated struc¬ 
tures, along with pointers to all the relevant 
identifiers and reactions in the various data 
sources. Similar pointers exist to the original 
citations and stored data. 

Physicochemical properties that are based 
solely on the structure are stored in the data 
warehouse, but properties that are data- 
source dependent, such as citation or biologi¬ 
cal activity, are only referenced. A typical use 
of a chemical warehouse is to search for a set of 
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Figure 9.16. Star schema design of a chemical data warehouse. The central source table allows 
access to the Extemal-IDof every molecule, arranged by source database. These Extemal-ID values 
can be used to build multidimensional views of the data. For example, to see all the reactions with 
products that can be found in source database ACD, one would combine data from the source 
dictionary table (Source_ID for database ACD), the reactions table (Struct_ID, and Role), and molt- 
able (Struct_ID) table, using identifiers (Extemal-ID)from the central source table. 


structures that satisfy a search query, then 
drill-down using a web browser to access the 
original data sources. In the case of reactions, 
the user might retrieve and browse a list of 
reactions that contain the structures that 
were found in the search. In addition to drill¬ 
down, a "hop-into" facility allows passing a set 
of structures into a search program or web 
browser that is native to the source database 
being accessed. 

5.2 Data Marts of Chemical Information 

For certain purposes, like reagent selection, a 
data warehouse is too large and comprehen¬ 


sive, and a much smaller database is sufficient. 
Such a data mart has the same architecture as 
a data warehouse, but it has only a single di¬ 
mension of structural data—for example, syn¬ 
thetic reagents. The MDL Reagent Selector 
program is one example of a data mart of re¬ 
agent structures, with information on their 
price and availability from various suppliers 
(Fig. 9.17). It has a fact table that links struc¬ 
tures to their identifiers in the various source 
databases. It stores properties that can be 
used to filter reagents, such as the molecular 
weight and LogP, and it has pointers to sup¬ 
plier information stored in the MDL Chemical 
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Figure 9.17. Reagent Selector—an example cf a chemical data mart. Various components cf the 
system are shown, including the data sources, the daemon program that automatically updates the 
mart, the concordance database, and the client/server architecture, which is implemented in a three- 
tier system. 


Products Index (CPI) database. To aid in re¬ 
ducing the size of a hit list, a Reagent Selector 
user can filter reagents and sort on properties, 
availability, presence or absence of functional 
groups, etc. (Fig. 9.18). Further list reduction 
can be achieved by clustering the structures 
by means of a cluster analysis using substruc¬ 
ture keys as descriptors. 

An important feature of Reagent Selector is 
the daemon program, which runs in the back¬ 
ground. This agent-like program "awakens" 
on a fixed schedule and checks the various 
source databases for new or deleted structures 
or for changes in the structures and data. If 
any changes or additions are found, the dae¬ 
mon updates the data mart accordingly, so us¬ 
ers will see the latest information when they 
run searches. Another aspect of chemical 
warehouses and data marts concerns their 
physical architecture. It is increasingly com¬ 


mon to see so-called "multitier" architectures 
in which the client program (the "application 
tier") may be a very "thin" Web client that 
communicates to a more extensive "middle 
tier" of programs that serve the immediate 
needs of the client (seeGlossary). Requests for 
searching and registration, which demand da¬ 
tabase server resources, are passed from the 
middle tier to a "database tier" that corre¬ 
sponds mostly to the server part of former cli¬ 
ent-server architectures. There are many ad¬ 
vantages to this arrangement. The programs 
can be distributed onto different computers to 
optimize performance of the system. The mid¬ 
dle tier can be modified independently to ac¬ 
commodate changes in the client and server. 
From a development point of view, the various 
tiers in the architecture can be developed and 
maintained on their own schedule, with mini¬ 
mal dependence on other components. 
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Figure 9.18. Filtering structures as part cf the reagent selection process. The filter criteria include 
criteria for structure complexity, logP, Hdonor/acceptor, molecular weight, formula, and substruc¬ 
tures. 


6 FUTURE PROSPECTS 

It is always difficult to predict the direction of 
advances in information management. Much 
cf the improvement in chemical structure 
management and searching has been because 
cf advances in hardware and computer sys¬ 
tems. Moore's Law, which states that com¬ 
puter power roughly doubles every 18 months, 
has held since the 1970s, but threatens to 
break soon (135). As computer manufacturers 
hasten to avert the leveling off of computer 
performance gains, new technologies will 
surely affect the way chemical information is 
stored and searched. Based on current trends, 
it is possible to make some short-term predic¬ 
tions about directions in this field. 

• Information: Integration of chemical struc¬ 
ture data with other types of data. There is a 
welcome tendency to treat structures, mod¬ 
els, and reactions like other relational data. 
The biggest advantage of this approach is 
being able to run integrated searches using 
structures and data together in search que¬ 


ries. An interesting approach pioneered by 
the Merck group involves generating finger¬ 
prints for key words in documents, then 
searching for combined structure/document 
similarity (136). This approach and similar 
ones will be simplified by increasing integra¬ 
tion with relational database systems, as de¬ 
scribed below. 

• Knowledge Discovery in Databases: In the 
past, dating back to the DENDRAL project 
(137), attempts to apply artificial intelli¬ 
gence and machine learning to problems in 
chemistry and drug discovery have gained 
only moderate acceptance. One problem 
with expert system approaches has been the 
small base of information to access. In some 
cases, this has been a single expert chemist 
or a handful of example structures. This sit¬ 
uation is changing as data accumulates in 
databases. As Fig. 9.19 shows, data can be 
organized, indexed, and stored in databases 
to produce information. This information, 
depending on how relevant, unique, and 
complete it is, can be analyzed and modeled 
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Figure9.19. Turning chemical data 
into knowledge. Data becomes orga¬ 
nized and indexed to produce infor¬ 
mation. Mining and analyzing infor¬ 
mation yields knowledge. 
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to generate knowledge that might not other¬ 
wise have been evident. Once this knowl¬ 
edge is materialized, it can be managed, 
shared, and deployed for future applica¬ 
tions. This process is termed Knowledge 
Discovery in Databases (KDD), and it is be¬ 
coming more widely practiced (138). 

Data mining is the mechanism by which 
knowledge is derived from databases. It is gen¬ 
erally defined as the extraction of predictive 
models and associations from large volumes of 
data using statistical and pattern recognition 
techniques, usually for some competitive ad¬ 
vantage. Data mining is already well estab¬ 


lished in the marketing, sales, and telecommu¬ 
nications fields (139). Data mining is being 
used increasingly by scientists, especially in 
genomics and proteomics. Example applica¬ 
tions include the clustering of DNA array daita 
and using database information for protein 
secondary structure prediction (140). Exam¬ 
ples are starting to appear in the field of dr U g 
design. Depending on the stage of a drug dis¬ 
covery project, one can mine chemical struc¬ 
ture data for diversity, similarity, or specific¬ 
ity, as shown in Fig. 9.20. This figure shows 
that the lead discovery, refinement, and opti¬ 
mization phases of drug discovery proceed 
through mini-cycles, each with their own datta 
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Figure 9.20. Mining chemical information in the drug discovery cycle. Each mini-cycle proceeds 
until a sufficient number of suitable compounds become available. 
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mining requirements. So far, data mining has 
mostly been applied to library design, QSAR, 
and ADME prediction (141). Whether the 
techniques become more widely used will de¬ 
pend on the accuracy of predictions made us¬ 
ing them, on the availability of convenient 
software, and most of all, on clean and rele¬ 
vant data. 

• Software: Integration with relational sys¬ 
tems. A consequence of treating structures 
as relational data is a tighter integration of 
once-specialized structure management 
software techniques with relational data¬ 
base systems. In Oracle, so-called "datacar¬ 
tridges" are being increasingly used to allow 
a chemist to treat structures like other rela¬ 
tional data in a search. Structures, models, 
and reactions can all be input, registered, 
and searched using standard SQL to which 
special operators have been added. SQL 
stands for Structured Query Language—the 
standard language for querying relational 
systems (see Glossary). For example, in the 
Daylight relational data cartridge, substruc¬ 
ture and similarity searches in a reaction 
database can be conducted directly in SQL 
as follows: 

Substructure — to find reactions contain¬ 
ing benzoic acid as a product: 

SELECT * FROM RXN WHERE CON¬ 
TAINS (SMILES, £ > >0=C(O)clcccccl’) 

= i; 

This statement translates to "Select ev¬ 
erything from the table named RXN, 
where the SMILES field contains the sub¬ 
structure string for benzoic acid as a 
product". The “ = 1” clause is an artifact 
of the data cartridge implementation; it 
does not necessarily mean that only a sin¬ 
gle occurrence of the benzoic acid sub¬ 
structure should be found. 

Similarity — to find how many reactions 
have a solvent that is 80% or more similar 
to acetic acid: 

SELECT COUNT (*) FROM MEDIUM 
WHERE SI MILA R (SMILES, ‘0C(=0)C\ 
0 . 8 ) = 1 ; 

This translates to "Tell me the number of 
rows in the table MEDIUM where the 


value of the SMILES column in that table 
has an 80% similarity to acetic acid." 
Again, the “ = 1” parameter is an artifact. 

This approach greatly simplifies the devel¬ 
opment of applications. Also, the searches can 
take advantage of optimization that is built 
into the relational database system. Fig. 9.21 
shows a Web browser that uses the MDL reac¬ 
tion data cartridge to perform structure and 
reaction searches. The use of a direct search¬ 
ing approach with an object-relational data¬ 
base for combined retrieval of chemical and 
biological information was reported by Cargill 
and MacCuish (142). In the field of data min¬ 
ing, the generation, storage, and deployment 
of predictive models is fully integrated into 
SQL Server 2000 and Oracle 9i, and this trend 
will soon extend to other relational database 
systems (143). 

Another advance in chemical information 
software that promises to have considerable 
impact on drug discovery is "meta-layer" 
searching, as described by Hoctor (144). In 
this approach, queries entered by the chemist 
are submitted first to a middle-tier search en¬ 
gine, the meta-layer, which automatically and 
transparently generalizes and transforms the 
query into several queries. These are then 
submitted to various databases to retrieve 
"more of the same" kinds of information. The ' 
results are automatically formatted and pre¬ 
sented to the chemist in the context of a Web 
browser (Fig. 9.22). Thus, a name search 
might get converted automatically to a struc¬ 
ture, for substructure or reaction searching, a 
literature citation search, or a patent search, 
etc. The linking of searches across indirectly 
related literature can also be used to generate 
new knowledge (145). 

• Hardware and Operating Systems: The 
value of parallel and distributed processing 
was reported early in the development of 
structure search systems (146). Since then, 
some commercial products have adopted 
parallel processing. These mostly involve 
CPU-intensive searching like conformation- 
ally flexible 3D searching and docking. With 
the exception of such tasks, the speed of 
most chemical information searching is de¬ 
termined by data input and output (i.e, “I/O 
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Figure 9.21. Web client for an application that searches a relational reaction database. SQL state¬ 
ments are used to select structures and reactions that satisfy the search query. 


bound") because chemical structures are a 
highly "verbose" type of data. As chemical 
information systems integrate more with re¬ 
lational systems, they can take advantage of 
the parallel and distributed processing capa¬ 
bility of the relational system. An important 
development is the "24-7" availability of 
data in chemical databases (24 h/d, 7 d/wk). 
This can only be accomplished by distribut¬ 
ing and replicating databases across a 
network. 

It would be pure speculation to estimate 
the impact of changes in hardware and oper¬ 
ating systems on chemical information man¬ 
agement. Presently, Sun Microsystems is 
probably the dominant Unix system in chem¬ 
ical database management, largely because of 
their network presence and their support of 
Java. Microsoft has released their Windows 
XP operating system, which merges the Win¬ 


dows 98 and Windows 2000 software streams 
and will give them continuing dominance in 
the PC market for a while. Linux is quickly 
catching on as an inexpensive alternative, and 
it has a strong foothold in the molecular mod¬ 
eling area, but it requires operating system 
expertise and lacks a business software biase. 
Small handheld personal data assistants are 
becoming more capable, and wireless comput¬ 
ing is on the rise. The standard desktop com¬ 
puter has at least a 2.0 GHz processor with 
512 Mb or more of RAM, about a 30-72 Gb 
hard drive, and a combination read/write CD 
with DVD. A relational database of 1 million 
structures consumes about 3-4 Gb of disk 
space and can be substructure searched in a 
few seconds, returning a hit list containing 
thousands of structures. It takes much longer 
for a chemist to wade through the results of 
such a search or to process analytical or bioas¬ 
say results from a single combinatorial chem- 
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Figure 9.22. Using meta-layer searching to retrieve implicit information. A name search query is 
converted to a structure, which is then transparently searched to add structure-based search results 
to the literature citation. 


istry experiment than to run most data 
searches. In light of this, it seems evident that 
the tools that will succeed are those that will 
best assist the chemist in extracting relevant, 
implicit knowledge from the data and deploy 
that knowledge for future benefit. 

7 GLOSSARY OF TERMS 

2D Query Feature. A structural feature 
added to a 2D substructure search query to 
generalize the query or make it more specific. 
An example atom query feature would be spec¬ 
ifying a list of allowed atoms (Cl, Br, I) or lim¬ 
iting the number of attachments. A bond 
query feature would be allowing a single or 
double bond (S/D) or forcing the bond to have 
a particular stereochemistry. More complex 
query features can be used to specify which 
functional groups or substituents are allowed 


at a given position ( Rgroups ), specifying a 
range of chain length size (link nodes) and 
specifying atom, bond, or molecular data que¬ 
ries (Sgroups). 

2D Structure. In terms of chemical informa¬ 
tion, a collection of information about atoms 
and bonds that can be displayed in a manner 
such that a chemist would recognize it as a 
chemical structure. The atom and bond types 
and connections are usually explicit. The lay¬ 
out of the atoms in the display may be explicit 
(, x,y coordinates) or implicit — determined at 
the time of display. Hydrogen atoms may be 
fully or partially suppressed to save storage 
space. 

3D Model. In terms of chemical informa¬ 
tion, all the information in a 2D structure plus 
at least one set of 3D atomic coordinates. This 
is a single conformation of the structure, 
which is typically a low energy conformation 
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or even a crystal or spectrometrically deter¬ 
mined 3D structure. A 3D model may also con¬ 
tain information about multiple low-energy 
conformations and atom, bond, and molecular 
properties such as partial atomic charge, HO- 
MO/LUMO energy, etc. 

3D Query Features. Topographicalfeatures 
that relate atoms, bonds, and other 3D fea¬ 
tures to each other in a pharmacophore or 3D 
substructure search query. Typical features 
include (l)objects such as atoms, centers of 
rings, electron lone pairs, and regions of exclu¬ 
sion and (2) measurements of distance, angle, 
dihedral angle, radius of exclusion, etc. Mea¬ 
surements often have a range associated with 
them (e.g., distance between a carbonyl oxy¬ 
gen and a secondary amino nitrogen is 3.4- 
-5.0 A). 

Agent. A computer program that can run 
autonomously and on a schedule to perform 
database searches, maintenance, and report¬ 
ing activities that a chemist would otherwise 
have to do manually. An example would be an 
Internet notification service that sends the 
chemist an e-mail notification whenever a par¬ 
ticular database has been updated. 

Application Tier. In a multi-tier architec¬ 
ture, the collection of programs that run on 
the chemist's client or workstation machine. 
It is the tier of programs "closest" to the chem¬ 
ist in the architecture. Typically this may be a 
Web client program or other program with a 
GUI that allows the chemist to interact with 
the architecture. 

Artificial Intelligence. A branch of informa¬ 
tion science that attempts to use computer 
programs to perform or simulate human men¬ 
tal activity. Applications in chemistry include 
perceiving chemical structures, designing 
structures to fit topological or topographical 
criteria, designing 'novel' structures, etc. 
Many of the activities of AI overlap with, or 
contain elements of pattern recognition and 
data mining. 

ASCII. American Standard Code for Infor¬ 
mation Interchage — a widely used system of 
encoding alphanumeric information into 
eight-bit bitsets (bytes). The expansion of in¬ 
formation to include non-English characters 
requires the use of larger (16- or 32-bit) char¬ 
acter sets such as Unicode. 


Atom List. In a substructure search query, 
a list of allowed (or perhaps disallowed) atom 
types. Often represented within brackets: 
[Cl,Br,I]. 

Atom Stereochemistry. Usually refers to 
tetrahedral stereochemistry at a given atom, 
which must be a chiral or prochiral center. 
The stereochemistry may be local (or relative) 
or global (based on CIP conventions). If it is 
local, it usually is termed "parity" or some 
other nonspecific term, to distinguish it from 
true global stereochemistry ( R,S ). Local atom 
stereochemistry is a property of the atom and 
its nearest attached atoms. Global atom stere¬ 
ochemistry depends on the entire molecule 
and the stereochemistry at other chiral atoms. 
In some systems, the atom stereochemistry is 
perceived from the drawing of the structure, 
using "up" (wedged) and "down" (dashed) 
bond marks as cues. In linear notations, char¬ 
acters in the string can be used to specify the 
counterclockwise (@) or clockwise (@@) ori¬ 
entation of attachments at a given center. 

Atom-Atom Mapping. The procedure of as¬ 
signing each atom in a substructure query to a 
given atom in a candidate structure. The as¬ 
signed structure atoms must match the query 
atom in all characteristics, including atom 
type, stereochemistry, charge, attachments, 
etc. In some structures, a query may map onto 
the structure in many ways (multiple map¬ 
pings). Additionally, these mappings may 
overlap each other in terms of atoms and 
bonds, or they may be non-overlapping. Some 
search systems stop after the first mapping, 
whereas other perform exhaustive mapping, 
until no further mapping can be found. 

Automap . A feature implemented in reac¬ 
tion indexing programs like REACCS, which 
attempts to automatically "discover" which 
atoms and bonds are involved in a reaction 
transformation (thereacting center atoms and 
bonds). The chemist draws the reaction or the 
reaction query as a set of reactants leading to a 
set of products, then invokes the Automap fea¬ 
ture, which causes the reacting atoms and 
bonds to be marked and identified. When re¬ 
acting center atoms and bonds are specified in 
both the query and the reactions in the data¬ 
base, reaction substructure searching is faster 
and gives fewer false hits. 
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Backtracking. One process that is used in 
mapping substructure atoms and bonds to the 
corresponding atoms and bonds in a candidate 
structure. Given a certain query—say, an 
amide group [—C(=0)N—], a backtracking 
algorithm searches first for a carbon atom, 
then for an oxygen atom, then checks to see if 
they are doubly-bonded, and finally checks to 
see if a nitrogen is singly-attached. At any step 
in the process, if the check fails (e.g., with an 
ester, the final check would fail), the program 
"backtracks" to the last successful step and 
examines another eligible atom or bond. If no 
eligible atoms or bonds are found at that step, 
it backtracks to the next previous step in turn. 
This procedure is guaranteed to find a map¬ 
ping, but it can be slow, especially with large 
or highly symmetric queries or structures, 
where a multitude of similar paths must be 
examined. Alternative approaches that use an 
indexed tree can be faster, especially for large 
databases. 

BCUT Descriptors. Descriptors of chemical 
structure that are derived from an eigen anal¬ 
ysis of the connection table of the structure. 
The class of BCUT descriptor depends on the 
quantities that are stored in the table (simple 
connection information versus electronic or 
steric interaction values). BCUT descriptors 
have found value in molecular diversity and 
chemical library design. 

Binary Data. Data stored in a file or data¬ 
base that is not chemist-readable, and usually 
cannot be converted to printable characters. 
Examples include connection table storage in 
a database, substructure search keys, and a 
graphics image of a structure. Note that some 
other data that is also not chemist-readable, 
like certain linear notations (e.g., a Chime 
string), may be made up of printable charac¬ 
ters and is not strictly binary data. 

Bioinformatics. The application of statisti¬ 
cal and mathematical techniques to turn se¬ 
quence data into useful biological informa¬ 
tion. The general goal of bioinformatics is to 
define the structure, location, and function of 
the proteins and nucleic acids that are the 
products of the processing of a genome. The 
application of bioinformatics in drug discovery 
is primarily the identification of new thera¬ 
peutic targets. 


Biological Data. This includes the results 
of in vitro and in vivo assays, toxicology and 
metabolism studies, DNA and protein array 
data, etc. It complements the chemical data, 
and increasingly, both chemical and biological 
data are being stored in large corporate rela¬ 
tional databases. At any given stage in the 
drug discovery process, obtaining and analyz¬ 
ing the biological data has traditionally been 
considered the more complex and rate-limit¬ 
ing step in the process. The application of 
high-throughput methods to screening and 
pharmacokinetic analysis is yielding consider¬ 
able benefit in the collection and processing of 
biological data. 

Bitset. A contiguous set of binary digits 
(bits, Oil) in computer memory. Bitsets are of¬ 
ten used in chemical information to store col¬ 
lections of yes/no, presence/absence, and ac- 
tivelinactive responses in a compact form. 
Bitsets are used to store substructure search 
keys for each structure (fingerprints), which 
are used in similarity calculations. Bitset in¬ 
dexes are common features of relational data¬ 
bases, where a collection of bits, one for each 
structure in the database, can store the pres¬ 
ence or absence of a given piece of data for 
each structure, or, in the case of a substruc¬ 
ture search, the a compact representation of 
the result set from the search. The advantage 
of bitset representation is that computers can 
perform very fast logical operations (union 
and intersection) on bitsets, which enables fil¬ 
tering and subsetting of large lists of struc¬ 
tures and data. 

BLOB. A Binary Large Object data type. 
This data type is used in Oracle, for instance, 
to store large amounts (e.g., up to several Gi¬ 
gabytes) of binary data. Storage of the connec¬ 
tion table and all the perceived structural in¬ 
formation for a registered structure is one 
example. Another example is the storage of 
the entire fastsearch index for a database, 
which can be accessed as a single object by the 
Oracle data storage and retrieval routines. 

Bond Stereochemistry. This complements 
the atom and molecule stereochemistry of a 
structure. A given double bond can be as¬ 
signed Z or E, or cis or trans stereochemistry 
based on the attachments. If the stereochem¬ 
istry is unknown or is a mixture, it can be 
assigned a value of "either." In a substructure 
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search, the bond stereochemistry can be spec¬ 
ified in the query to limit the scope of the 
search. In some registration systems, the bond 
stereochemistry of a given structure is per¬ 
ceived from the input drawing of the struc¬ 
ture. In the case of linear notations, it can be 
specified by characters in the string (e.g., 
C1\C==C\C1 specifies fraws dichloroethene). 

BRN. The Beilstein Registry Number, 
which can be used to access structures in the 
Beilstein database. 

Business Rule. An established convention 
for the representation of data in a given com¬ 
pany or laboratory. In the case of chemical 
structures, an example of a chemical business 
rule would be "all nitro groups should be 
drawn as — N(=0)(=0)—and not the charge 
separated form —N + (— 0 _ )(=0).” In the 
case of biological data, a business rule might 
enforce the units in which a given piece of test 
data is reported (e.g., dosage in mmol/kg). 
Business rules can be enforced by preprocess- 
ingdata before it enters the database, or in the 
case of multiple, diverse data sources feeding 
into a data warehouse or data mart, the data 
can be transformed to the correct form before 
storage in the warehouse. 

Canonical Numbering. Reordering the 
numbering of atoms in a structure to a unique 
order, based on the extended counting of the 
number of attachments at each center the 
atom and bond types, etc. 

CAS Number. Chemical Abstracts Registry 
identification number—very widely used to 
identify chemical structures. 

Chem(o)lnformatics. By analogy with bioin¬ 
formatics, this is the application of statistical 
and mathematical techniques to turn chemi¬ 
cal structure data into useful chemical and bi¬ 
ological information. It makes use of tech¬ 
niques from statistics, pattern recognition, 
artificial intelligence, and data mining to de¬ 
rive useful predictive relationships between 
structures and their biological or physico¬ 
chemical properties. Broadly considered, 
cheminformaticsalso includes the input, stor¬ 
age, management, and searching of chemical 
structure information. 

Chemical Library. A collection of struc¬ 
tures, real or virtual, that is the current start¬ 
ing point for high-throughput screening or 
analysis. A library may be all the structures in 


a database, or more commonly, a subset of 
these. It might consist of diverse structure 
types or it might represent the enumeration of 
one or a few generic structures. Libraries can 
be classified according to the stage of discov¬ 
ery—i.e., diverse libraries for lead discovery, 
focused libraries for lead development, and op¬ 
timized libraries for lead optimization. 

Chemical Space. A loosely defined concept 
that all the known or possible chemical struc¬ 
tures define some multidimensional syace in 
which the structures are points. Structures 
that are topologically or topographically simi¬ 
lar to each other (i.e., look similar), cluster in 
chemical space, and by the principle of chem¬ 
ical similarity, should show similar physico¬ 
chemical and biological properties. This is the 
basis for diversity analysis of chemical librar¬ 
ies. The challenge is to select or discover prop¬ 
erties of the structures that define the chemi¬ 
cal space and can be used. 

CIP Stereochemistry. Cahn-Ingold-Prelog ste¬ 
reochemistry conventions. An IUPAC approved 
and widely used set of rules for assigning stereo¬ 
isomers based on atom and group priorities (see 
http://www.chem.qmw.ac.uk/iupac/stereo/). 

Cleaning and Transforming Data. When im¬ 
porting data from diverse data sources (files, 
databases, spreadsheets, LIMS systems, etc.) 
into a database or data warehouse, the cjata 
usually needs to be standardized, checked, and 
sometimes transformed to some common for¬ 
mat and content. This allows faster search and 
retrieval, and serves as a check of data integ¬ 
rity. The rules that define the cleaning/trans- 
formation process are often termed "business 
rules," and in the case of chemical data, they 
may include checking and modification of 
chemical structures. 

Client-Server Architecture. A computer ar¬ 
chitecture in which a "server" computer (usu¬ 
ally a larger and faster machine at a central 
location) runs programs that communicate 
over a network with numerous workstations 
or "client" machines that reside in offices and 
laboratories. The server computer performs 
heavy duty computing tasks such as database 
searching and molecular and data modeling, 
in response to commands from the users of the 
client comvuters. It then communicates the 
results back to the client machines. There, de¬ 
pending on whether the client is "thick" (a 
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relatively large and capable application that 
can display and manipulate data and struc¬ 
tures), or "thin" (a small program, possibly 
running in an Internet browser), the data is 
displayed, manipulated, and reported. Client- 
server architecture is two-tier, and is being 
supplanted by more versatile multi-tier ap¬ 
proaches. 

Clipping. The computer application of a 
chemical transformation to a set of structures. 
One example would be the conversion of a set 
of o-subsituted phenols to a generic represen¬ 
tation with the ortho substituents collected 
into an Rgroup attached to the parent phenol 
structure. The reverse process, goingfrom the 
generic structure to all the specific non-ge¬ 
neric structures, is termed enumeration. Clip¬ 
ping also includes functional group transfor¬ 
mations, such as converting a ketone to an 
alcohol. In the process of cleaning and trans¬ 
forming chemical structure data, clipping may 
be involved when chemical business rules are 
“enforced.” 

CLOB. A Character Large Object data 
type. This data type is used in Oracle, for in¬ 
stance, to store large amounts (e.g., up to sev¬ 
eral Gigabytes) of character data. Storage of 
structures in a relational database in mole¬ 
cule-file format is an example. 

Cluster Analysis. The process of discovering 
"natural" groupings of points in the space of 
some measurements or descriptors. In chemi¬ 
cal information management, one often clus¬ 
ters chemical structures for diversity analysis 
or to subset the results of a search. Structures 
are most often clustered using functional 
group fingerprints as the descriptors. Cluster¬ 
ing methods usually consist of either parti¬ 
tioning methods like k-means and Jarvis- 
Patrick, or hierarchical methods, which may 
work by successively dividing the points (divi¬ 
sive clustering) or by successively aggregating 
points (agglomerative clustering). Cluster 
analysis is an important part of unsupervised 
data mining and pattern recognition. 

CML. The Chemical Markup Language. 
Based on XML and HTML, it provides a stan¬ 
dard self-documenting molecule file and infor¬ 
mation interchange format. Information is de¬ 
scribed by tags and values. A CML document 


can be "parsed” by a freely available computer 
program that can return the structural infor¬ 
mation on demand. 

Combinatorial Chemistry. The application 
of high-throughput, parallel methods to the 
synthesis, analysis, screening, and testing of 
materials. This approach relies on robotics 
and computer-assisted methods to generate 
and analyze the results. Synthesis, analysis, 
and testing of samples occurs in the wells of 
micro titer plates, which may contain as few as 
96 samples or as many as a few thousand. 
Solid-phase and solution methods are used, 
and samples may be “one-bead-one-com- 
pound" or they may contain mixtures, which 
require "deconvolution" to determine which 
component is responsible for observed activ¬ 
ity. 

CONCORD. Rapid 2D to 3D conversion 
program introduced by Robert Pearlman's 
group in 1987. It generates low energy-ap¬ 
proximate 3D models from 2D connection ta¬ 
bles. It can also do stereo "multiplexing," 
where multiple configurations of stereochemi- 
cally ambiguous structures are generated. 
Marketed by Tripos, Inc. 

Concordance. A data warehouse architec¬ 
ture used in MDL relational chemical and re¬ 
action databases. The central "fact" table of a. 
concordance has a record for each unique 
structure in the database, with pointers to the 
instances of the structure in various "source" 
databases. 

Connection Table. A table or matrix con¬ 
taining topological information about a chem¬ 
ical structure. A structure can be considered a 
"graph" in 2D space, with atoms as "nodes" 
and bonds as "edges." The atom connection 
table has one row and one column for each 
atom. The diagonal elements of the table are 
usually the atomic number, and the off-diago¬ 
nal elements have a zero or null if two atoms 
are not connected; otherwise they contain the 
order (1, 2, 3, aromatic, etc.) of the bond con¬ 
necting the row and column atom. A less com¬ 
mon connection table is the bond connection 
table, in which the rows and columns are the 
bonds in the structure, the diagonal elements 
are the bond order, and the off-diagonal ele¬ 
ments contain information about the atoms at 
the ends of the bonds. 



402 


Chemical Information Computing Systems in Drug Discovery 


CONVERTER. A rapid 2D to 3D conversion 
program marketed by Accelrys. It uses a dis¬ 
tance geometry approach to modeling, which 
covers a wider range of conformations than 
other methods. 

CORINA. A rapid 2D to 3D conversion pro¬ 
gram developed by the Gasteiger research 
group at the University of Erlangen. It can 
handle macrocyclic ring structures, which can 
be problematic in other conversion programs. 
Chemists can access CORINA online (http:// 
www2.chemie.uni-erlangen.de/software/ 
corina/free_struct.html). 

Daemon. From Unix, a program that runs 
continually as a background process to per¬ 
form routine functions on demand or on a 
schedule. In the context of a chemical data 
warehouse, an example would be a registra¬ 
tion program that periodically checks input 
databases to see if there are any new struc¬ 
tures that need to be added to the warehouse. 
If there are, the daemon extracts the struc¬ 
tures from the source databases, transforms 
and "cleans" them if needed, and registers 
them to the warehouse. 

Data Cartridge. A popular term for user- 
customizable search "operators" that can be 
added to the SQL language of a relational da¬ 
tabase system. An example in chemical infor¬ 
mation is the addition of a substructure search 
(SSS) operator to integrate this type of search 
directly into a relational database search. One 
advantage of this approach is that the search 
"strategy" that the relational search program 
applies can take the complexity of the custom 
operator into account (the "cost") when per¬ 
forming the various search operations. 

Data Compression. The process of trans¬ 
forming a potentially large amount of data 
into a smaller dataset, in such a way that re¬ 
versing the transformation results in no loss 
of information in the original data. A simple 
example of a compression operation is to re¬ 
place a string of blanks with a count plus a 
number that designates the character to be 
repeated. Compression programs include per¬ 
sonal computer programs like PKZIP and 
WINZIP, and Unix utilities like gunzip. Com¬ 
pression methods are often used before stor¬ 
ing data in a database and before transmitting 
data over a network. When the data is re¬ 
trieved from the database or received on the 


network, it must then be "decompressed" by 
reversing the steps in the compression pro¬ 
cess. A chemical information example of com¬ 
pression is the conversion of an MDL molfile 
to a Chime string, which uses ZIP file com¬ 
pression methods. 

Data Mart. Typically, a one-dimensional 
data warehouse — collecting data from multi¬ 
ple sources, extracting/cleaning/transforming/ 
loading (ECTL) the data, and then indexing it 
for analytical (OLAP) and data mining pur¬ 
poses, to be used by a given group or depart¬ 
ment. In chemistry, an example would be an 
inventory database, with structures, location, 
and purchasing information from many ven¬ 
dors for use by synthetic chemists. Like a data 
warehouse, a data mart often has a central fact 
table with each record containing pointers 
into other dimension tables that contain rela¬ 
tional data about the items in the fact table. 
The fact and dimension tables are connected 
in an organization called the star schema, 
which is a common design for data marts and 
warehouses. Data marts are often subsets of a 
data warehouse. 

Data Mining. The extraction of previously 
unknown predictive relationships from a large 
data set or database. Data mining makes use 
of descriptive unsupervised methods such as 
association and cluster analysis, as well as pre¬ 
dictive supervised methods such as decision 
trees, curve fitting, neural networks, and 
Bayesian methods. Data mining was once con¬ 
sidered "data snooping" and had a poor repu¬ 
tation. The need to analyze huge volumes of 
data and the success of these methods in mar¬ 
keting and finance have prompted scientists 
and statisticians to reconsider its use. 

Data Warehouse. A large relational data¬ 
base that collects data from multiple diverse 
sources and organizes it for optimal analytical 
searching and reporting (OLAP). A data ware¬ 
house is a superset of a data mart, containing 
archival and unchanging data that is impor¬ 
tant to several groups of researchers (i.e., mul¬ 
tidimensional). Data that enters a data ware¬ 
house does not usually come from original 
sources (i.e., chemists, instruments, or as¬ 
says). It usually comes from intermediate data 
sources and undergoes cleaning and transfor¬ 
mation (ECTL) before registry into the data 
warehouse. Typically, data is not deleted from 
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a data warehouse, because historical trends 
are important. For this reason, the warehouse 
grows very large over a long period of time, 
and thus its organization and indexing are 
crucial considerations. An example in chemis¬ 
try would be a single database containing 
structures, models, reactions, and data, all 
cross-referenced, and used by chemists, biolo¬ 
gists, and modelers. Typically, each group 
would extract their own data mart from the 
warehouse, containing information relevant 
to their needs. Data warehouses are often used 
in decision support systems (DSS) to provide 
data on which to base important corporate de¬ 
cisions. 

Database Tier. In a three-tier programming 
architecture, the database tier resides on a 
server computer with access to the databases 
and the programs that manage them. 

Deduplication. When registering into a 
chemical structure database, the process of 
finding whether the given structure already 
exists in the database. This usually involves 
performing an exact match search with the 
given structure as the search query. Note that 
the definition of exact match may vary with 
the database, and it may even be configurable. 
For example, some databases may consider 
tautomers to be acceptable as exact matches, 
whereas others may require a more strict def¬ 
inition. 

Dimension Tables. In a data mart or ware¬ 
house, the dimension tables store non-redun- 
dant information about the entries in the fact 
table of the database. For the chemical exam¬ 
ple of an inventory data mart, the fact table 
stores the various source database identifiers 
cf each unique structure in the data mart. A 
dimension table of molecular formulas would 
store the formula for the unique structure in 
the mart, rather than storing the same for¬ 
mula for each occurrence of that structure in 
the various source databases. 

Drill-Down. Accessing data with increas¬ 
ing amounts of detail. When examining and 
browsing the results of a database search, a 
chemist can often request further information 
about a structure, even though that informa¬ 
tion was not included in the search. The pro¬ 
cess of accessing further information, often 
stored in a hierarchical manner, is termed 


drill-down. The opposite process, which aggre¬ 
gates data, is termed roll-up. 

ECTL. The process of Extracting, Cleaning, 
Transforming, and Loading data into a data 
mart or data warehouse. The data in a mart or 
warehouse should be standardized, complete, 
unambiguous, etc. Raw data from files, instru¬ 
ments, databases, the Internet, etc., must usu¬ 
ally be preprocessed before it is "clean" 
enough to be used in decision making. Struc¬ 
tures present special problems because tau¬ 
tomers, isomers, salts, etc. may all represent 
valid forms. The use of chemical processing 
languages, which can search for substructures 
and make modifications of specific atoms and 
bonds, enables the enforcement of chemical 
business rules during the ECTL process. 

Encryption. The conversion of data in a 
readable or decipherable code into another, 
possibly undecipherable, code. The most com¬ 
mon encryption involves sensitive pieces of 
data like passwords and identification num¬ 
bers. In chemistry, it is sometimes necessary 
to encrypt larger pieces of information, such 
as chemical structures and the results of as¬ 
says—at least during passage of such informa¬ 
tion over networks or the Internet. Decryption 
of the information typically requires one or 
more keys, which are often built into the en¬ 
cryption and decryption software. 

Enumeration. The systematic substitution 
of all the Rgroup members in a generic struc¬ 
ture, giving each possible specific structure 
the generic structure represents. If some of 
the Rgroups are not converted in the process, 
it is termedpartial enumeration. 

Equivalence Class. In the canonicalization 
of structures that have some element of sym¬ 
metry, certain atoms that are topologically 
equivalent may yield the same canonical num¬ 
ber. These atoms are considered to be in the 
same equivalence class. The concept of equiv¬ 
alence class is used, for example, in the Day¬ 
light Chemical Information Systems handling 
of reactions, to examine equivalent atoms 
when mapping reactant and product atoms. 

Exact Match Search. One type of structure 
searching in which a query molecule is 
searched for in a database of structures. To 
exactly match the query, the target structure 
must be topologically identical and not be a 
substructure or superstructure of the query. 



404 


Chemical Information Computing Systems in Drug Discovery 


Extended Stereochemistry. A type of tetra¬ 
hedral or higher level stereochemistry that ap¬ 
pears, for example, in allenes, where the ste¬ 
reochemical center is not a single atom but a 
system of atoms and bonds that can be concep¬ 
tually collapsed to a single atom to yield a ste¬ 
reo center. 

External Registry Number. A unique "exter¬ 
nal" identifier assigned to a structure, 
reaction, 3D model, assay, etc. The external 
registry number is usually unique across data¬ 
bases, and it can be used as a key to link data 
from one database or table to another. Any 
given database may have its own "internal" 
identifiers that may not be unique across da¬ 
tabases. 

Fact Table. A central table in a data ware¬ 
house whose rows each represent one unit of 
primary importance in the warehouse. In a 
chemical warehouse, the rows of the fact table 
might correspond to unique structures in the 
database. In a biology data warehouse, each 
row might correspond to a single experiment. 
The fields in the fact table are mainly pointers 
to information stored in other tables, or they 
contain data that may be repeated in other 
tables but is stored in the fact table (i.e., de- 
normalized) for rapid access. The fact table 
connects to other "dimension" tables in the 
warehouse that contain specific information 
that is not duplicated. 

Fastsearch Index. Term used in MDL data¬ 
bases for a tree-structured index of all the 
structures in the database. The nodes in the 
tree represent increasingly complex substruc¬ 
tures or properties, where all the structures at 
or below a given node have in common. The 
fastsearch index can be very large, but it 
makes possible very rapid substructure 
searching. 

Field. In database terminology a column of 
data in a table. Fields are commonly selected 
in searches of the database, such as "SELECT 
MOLSTRUCTURE, MOLWEIGHT FROM 
MOLTABLE WHERE SSS(MOLSTRUCTURE, 
c query.mol’)=l AND MOLWEIGHT<500.0”. 
Here, MOLSTRUCTURE and MOLWEIGHT 
are fields in the MOLTABLE table. SSS is a 
function that operates on the MOLSTRUC¬ 
TURE field to find molecules that contain the 
structure contained in the file query.mol, as a 
substructure. 


Filter. A query or set of criteria designed to 
select a subset of a given set of data or results. 
Filters are usually applied to limit the number 
of hits from a search, or to limit the input to 
some analysis. Sometimes filters are designed 
to remove invalid rows of data, to randomly 
select a subset, or to remove rows based on the 
values of certain fields. A common application 
of filtering is in reagent selection, where reac¬ 
tive groups, multi-functional structures, or 
cost criteria may be applied to the selection of 
compounds for reactions. A filter can usually 
be expressed as a SELECT statement in a da¬ 
tabase search. 

Fingerprint. A set of measurements or de¬ 
scriptors, usually binary, that can be used to 
characterize and identify an object. In the case 
of a chemical structure, a common fingerprint 
is a set of substructure keys that represent the 
presence or absence of specific functional 
groups. Such keys can be used to compute the 
topological similarity of the structure to other 
structures, and can be used as filters in data¬ 
base searching. Other common fingerprints 
include IR and mass spectral fingerprints, and 
fingerprints of how a structure behaves in a 
set of biological assays. 

Flat Database or File. Essentially a spread¬ 
sheet of data, in which a given row contains all 
the data about a structure. There are no hier¬ 
archical relationships in a flat database. Many 
older and proprietary structure databases 
were flat in structure. These are in contrast to 
relational databases that are more commonly 
used at present. 

Flexmatch Search. Term used in MDL 
structure searching to allow "relaxed" exact 
match searching of structures. One can spec¬ 
ify, for instance, that everything must match 
except bond orders, or stereochemistry, or va¬ 
lence at atom centers, etc. By turning on or off 
various flags, one can for a given structure 
query, retrieve isomers of various types, salts 
of a the structure, or instances of the structure 
that may contain different values of certain 
types of attached data. 

Generic Structure. A structure convention 
that allows representation of, say, a combina¬ 
torial library, as a single, generalized struc¬ 
ture. The fixed parts of the structure are rep¬ 
resented by the "root" or "parent" structure, 
and variable parts are represented by 
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“Rgroups” or "substituent groups" that can 
each contain multiple substituents or frag¬ 
ments. 

Gigabyte. One thousand megabytes, or 10 9 
bytes of data. The largest chemical structure 
databases presently contain a few tens to hun¬ 
dreds of gigabytes of data. A typical structure 
in a database may require a few thousand 
bytes of data to store the connection table, co¬ 
ordinates, and other structure-specific data. 

Graph and Subgraph Isomorphism. In chem¬ 
istry, the mapping of a structure or substruc¬ 
ture query to a target structure. All the atoms 
and bonds in the query (the nodes or vertices 
and edges of the "graph" of the query struc¬ 
ture) must be mapped to corresponding atoms 
and bonds in the target structure to generate a 
hit. 

Hash Code. Converting a set of numeric or 
character properties into a single, mostly 
unique, number, for the purpose of rapid 
lookup and retrieval. For example, in the case 
of chemical structures, it is common to gener¬ 
ate and store a hash of the molecular formula, 
so that when a user requests a formula search, 
the search query typed by the user is con¬ 
verted to the same hash number, and a single 
lookup in the index gives all the structures 
that correspond to the given formula. A hash 
code is often generated as a linear combina¬ 
tion of the possible values of each of the prop¬ 
erties (e.g., n 1 P 1 +n 2 P 2 +- . where the n’s are 
selected such that the products never overlap). 
If several structures have the same hash code, 
they are termed "collisions," and typically re¬ 
quire further processing—like substructure 
searching—to differentiate them. 

Hierarchical Clustering. One of three main 
types of clustering applied to chemical struc¬ 
tures (hierarchical, partitioning, and fuzzy 
clustering). In hierarchical clustering, a tree 
or dendrogram is constructed, with one struc¬ 
ture at each of the leaves of the tree. By "trim¬ 
ming" the tree at a given level, one can collect 
structures into a given number of clusters, 
such that all the structures in a single cluster 
have some level of similarity to each other. 

High-Throughput Chemistry. Application of 
parallel processing to the synthesis, analysis, 
and screening of structures. A subset of high- 
thoughput chemistry is combinatorial chemis¬ 
try. 


Hit List. Older term for a list of identifiers 
of structures or other objects obtained from a 
database search. A more modem term is “re¬ 
sult set." 

HTML. HyperText Markup Language. The 
most commonly used specification language of 
the Internet. Other markup languages of in¬ 
terest in chemistry include XML (extensible 
Markup Language — information in general), 
CML (ChemicalMarkup Languagechemical 
structures), VRML (Virtual Reality Markup 
Language—3D visualization), and PMML 
(Predictive Model Markup Languagedata 
mining). 

Index. A secondary data field generated 
from one or more primary data fields, to en¬ 
hance the searching and retrieval of the pri¬ 
mary data. An index in a chemical database 
may be a characteristic of the database, such 
as Oracle indexes, or it may be a chemistry- 
specificindex such as a tree indexfor substruc¬ 
ture searching. Indexes require extra space, 
and they typically must be created and main¬ 
tained by some administrative process in the 
database. 

Inventory Data. Typically, information 
about the availability of reagents for chemical 
synthesis. This includes the suppliers, pack¬ 
age sizes, purity, and cost of commercial re¬ 
agents, and the location, owner, and availabil- . 
ity of in-house reagents. Increasingly, this 
information is being integrated with chemical 
structure databases and warehouses and with 
automated ordering and procurement pro¬ 
grams. 

Inverted Keys. When substructure search 
keys are generated for a structure, they may 
be stored in normal order (where each record 
represents a structure, and the bits or fields 
for that structure represent the keys). Alter¬ 
natively, they may be stored in inverted or piv¬ 
oted order, where each record represents a 
given substructure key, and the bits represent 
structures that have that particular key set. 
This type of storage benefits key searching, 
where a user wants all the structures that 
have a particular key set. 

Isomer and Tautomer Search. A search types 
where bond order, hydrogen counts, certain 
atom valences, and bond or atom stereochem¬ 
istry may be allowed to vary from those speci¬ 
fied in the query. Such searching allows re- 
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trieval of keto-enol tautomers, cis-trans 
isomers, etc. The generalization of the search 
may be a function of the query of the search 
process (see Flexmatch search) or both. 

lava. Currently the most popular computer 
language for Internet and middle-tier pro¬ 
gramming. Java is an object-oriented lan¬ 
guage developed by Sun Microsystems, that 
runs on multiple platforms and contains 
built-in features for networking, database ac¬ 
cess and security, graphics, etc. Other lan¬ 
guages widely used in chemical information 
programming include C, C++, and Perl. Java 
and C++ are called "object oriented" lan¬ 
guages that focus on the "business objects" of 
the application—like molecules and reactions. 
C and Perl are more "procedure oriented" lan¬ 
guages that focus on the things the objects do 
and the process that manages them. 

loin. The retrieval of data from more than 
one table in a relational database, into a single 
result set. Depending on the structure of the 
data in the various tables, and the nature of 
the search query, extra data may need to be 
added to the result set to fill in certain fields, 
or the fields may be unpopulated in the result 
set. 

KDD. Knowledge Discovery in Database* 
application of analysis and data mining tech¬ 
niques to discover "knowledge" that may be 
implicit but undiscovered in large amounts of 
data. 

Key Field. Field in a table that uniquely 
identifies rows in the table (primary key) or 
contributes to uniquely identifying the rows 
(secondaryor composite key), or that connects 
the given row to data in another table (foreign 
key). Key fields are usually indexed for rapid 
lookup and retrieval. 

Kmeans Clustering. Type of partitioning 
cluster analysis in which an object, such as a 
chemical structure, is placed into one of K 
clusters, based on how similar the structure is 
to the average value (or centroid) of each clus¬ 
ter. The average of the cluster may be an ac¬ 
tual structure itself, in which case the tech¬ 
nique is referred to as K-medoids clustering. 

Linear Notation. Representation of a chem¬ 
ical structure using a linear string of numbers 
and letters. A linear notation is designed to be 
interpreted by a chemist. Thus, a SMILES 


string represents a linear notation, while a 
Chime string represents a compressed nota¬ 
tion. 

Linux. Microprocessor version of Unix de¬ 
veloped by Linus Torvald. Linux is presently 
used mostly in network servers and in clusters 
of microcomputers used for large-scale paral¬ 
lel computation. It is gaining status as an al¬ 
ternative to Microsoft and to Unix, for data¬ 
base applications, because Oracle and other 
vendors provide Linux versions. 

Logic in Query Features. Using AND, OR, 
and NOT as modifiers on the application of 
query features. For example, one could run a 
search to select structures that contain "halo¬ 
gen and not primary or secondary amine, or 
not halogen and any amine." The logic can be 
a part of the query substructure, as with 
Markush queries, or it can be part of the SE¬ 
LECT statement. 

Markush Structure. Essentially a generic 
structure, in which a root or parent structure 
plus Rgroups and their members can repre¬ 
sent an entire combinatorial library. Markush 
structures were developed for patent pur¬ 
poses, and the specification of substituents are 
often more general than in the case of generic 
structures or database queries. Markush 
structures are also used to represent generic 
reactions, in which the reactants and products 
are represented by generic structures. 

Member. A single substituent or moiety in 
an Rgroup collection. If R x consists of the sub¬ 
stituents (Me,Et, Pr, and Ph), these are 
termed the members of R x . 

Metadata—"Data About Data". In a data¬ 
base, metadata describes the structure, for¬ 
mat, storage, access, and various properties of 
other stored data, or it describes the charac¬ 
teristics of the database itself. Metadata is 
sometimes stored in tables termed "dictionar¬ 
ies." As an example, chemical metadata might 
include the tablename, fieldname, and format 
of the field containing the chemical structures. 
This metadata might be stored in a master 
table in the database and be stored as proper- 
ty_name and property-value pairs, which the 
application that accesses the database can un¬ 
derstand. 

Middle Tier. Large-scale database applica¬ 
tions often consist of three "tiers" of pro¬ 
grams: (l)the application tier, which interacts 
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with the user, (2) the database tier, which in¬ 
teracts with the database management sys¬ 
tem, and (3) the middle tier, which sits be¬ 
tween the user application and the database 
management. The functions of the middle tier 
include (l)receiving, transforming, and pass¬ 
ing queries and commands from the applica¬ 
tion tier to the database tier, (2) receiving, 
consolidating, and formatting data from the 
database tier and passing it to the user, and (3) 
performing tasks that may be required by the 
application but are not available in the data¬ 
base tier, such as managing hit lists and que¬ 
ries. Middle tier programs are often written in 
Java, a language that runs on many platforms 
and contains Internet features to communi¬ 
cate with the application tier and database 
features (JDBC—Java Database Connectiv¬ 
ity) to connect to the database tier. 

Molecular Connectivity. A class of molecu¬ 
lar descriptors derived from the connection ta¬ 
ble of a structure. For increasing path lengths 
(1-, 2-, 3-bonds, etc.), the molecular connectiv¬ 
ity values are computed as the sum of func¬ 
tions of the connectivity values (number of 
attachments) of the atoms in the path. Molec¬ 
ular connectivity descriptors can be used to 
distinguish structures. As such, they can be 
correlated with physicochemical properties 
that are functions of structure size, linearity, 
and degree of branching. 

Morgan Algorithm. A procedure for finding 
a mostly unique (canonical)ordering of atoms 
in a structure. It involves an iterative process 
that begins by assigning each atom a score, 
which is initially computed by counting the 
number of neighboring atoms attached. In 
successive iterations, the score at a given atom 
is computed by summing the previous itera¬ 
tion scores of all the atoms to which it is at¬ 
tached. Eventually, th q order of the scores be¬ 
comes invariant (i.e., does not change with 
further iteration). At that point, atoms that 
are topologically equivalent have the same 
score. Atoms in the structure are then renum¬ 
bered by the order of their Morgan number. 
One advantage of canonical numbering is that 
a given structure, drawn two different ways 
(i.e., different ordering of the atoms) can be 
reduced to the same Morgan numbering, and 
thus be matched quickly. 


Multidimensional Database. A relational 
database in which multiple general types of 
data are stored, indexed, and cross-referenced, 
for use by several different groups. In chemis¬ 
try, an example would be a database contain¬ 
ing reactions, 2D structures, perhaps generic 
structures or libraries, and 3D models. Such a 
database would be used by synthetic, chemical 
informatic, and molecular modeling scientists. 
A data warehouse is often a multidimensional 
database, whereas a data mart is usually sin¬ 
gle-dimensional. 

Multi-tier Architecture. An expansion of a 
client-server architecture to include a middle 
layer of software. The middle tier may run on 
a computer different from either the client or 
server computers. The middle tier isolates the 
client and server programs, so that changes in 
either of them do not require corresponding 
changes in the other. The middle tier acts as to 
receive, authenticate, and transform data as it 
passes between client and server computers. 
To make middle tier software easy to change 
and maintain, it is often written in Java, a 
modern object-oriented computer language 
that is available free on most of today's com¬ 
puter platforms. 

Object-Oriented Language. Procedure-ori¬ 
ented languages like Fortran, Basic, and C op¬ 
erate by calling functions or subroutines with 
"arguments" that tell the program what to do 
with certain variables in memory—such as 
calculate the molecular weight of a collection 
of atoms. In object-oriented (OO)languages 
like C++ and Java, an object such as a mole¬ 
cule, has a molecular weight "method" that is 
specific to that object and is stored with the 
object in memory. In this way, a slightly dif¬ 
ferent object, like a mixture, could have its 
own molecular weight method—perhaps a 
weighted average. 

Object Relational Database. A relational 
database in which data can be collected and 
combined with methods to fit an object ori¬ 
ented model. Searching in the database is con¬ 
ducted in the context of the objects and their 
methods, rather than the raw data fields and 
stored procedures. There is usually consider¬ 
able overhead in building, maintaining, and 
using an object relational database, so this 
type of organization has not so far been widely 
used in chemical or biological databases. 
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OLAP. OnLine Analytical Processing. An 
activity that involves routine searching, anal¬ 
ysis, and reporting on data stored in a large 
database. The database, which may have a 
data mart or data warehouse organization, is 
optimized for the kinds of searches and re¬ 
ports that it supports. It may not be optimal in 
organization for transaction processing (OLTP), 
which may involve registration of small 
amounts of data on an irregular schedule, or 
for data mining, which involves the retrieval 
and analysis of large volumes of data. In chem¬ 
istry, an example of OLAP might be an inven¬ 
tory application in which the chemist draws in 
a structure or substructure, runs one or more 
filters on the resulting result set, and retrieves 
and prints a report of structures and inven¬ 
tory data. 

OLTP. OnLine Transaction Processing. An 
activity that involves registration, update, or 
simple searching in a database of transactions. 
In chemistry, this might be the routine regis¬ 
tration of a new structure and analytical data 
into a chemical database. Such a database is 
optimized for registration and may not be suit¬ 
able either for more analytical types of search¬ 
ing and reporting (OLAP)or for data mining. 

Parallel Processing. A technique whereby a 
given computer task is distributed among sev¬ 
eral central processing units (CPUs). The 
CPUs may be part of the same computer (e.g., 
a multiprocessing computer in which several 
CPUs share common memory and physical de¬ 
vices), or they may consist of several single¬ 
processor computers (a "cluster") that are 
networked to rapidly share information and 
disk space. In database management, it is be¬ 
coming increasingly common to have parallel 
copies (replications) of a given database at sev¬ 
eral sites, perhaps worldwide. Special data¬ 
base and networking software provides rapid 
updates of certain information like data and 
periodic updates of other information like 
search indexes. 

Pattern Recognition. The application of 
computers to build descriptive or predictive 
models (i.e., find patterns) of information 
from input datasets. The techniques of pat¬ 
tern recognition overlap those used in statis¬ 
tics, chemometrics, and data mining, and in¬ 
clude data display, description, and reduction, 
unsupervised methods such as cluster analy¬ 


sis, and supervised methods such as curve 
fitting and classification. Engineering applica¬ 
tions of pattern recognition include recogniz¬ 
ing objects in pictures, and character and voice 
recognition. Chemical applications are found 
in the fields of drug discovery, analytical 
chemistry, and chem/bioinformatics. 

Petabyte. One thousand terabytes of data 
(10 15 bytes). At present, the largest databases 
of chemical and biological information are gi¬ 
gabytes (10 9 bytes) in size. 

Pharmacophore. The minimum amount cf 
chemical functionality needed in a drug to 
elicit a given biological response. This func¬ 
tionality is defined in terms of atoms and func¬ 
tional groups and their geometric relation¬ 
ships to each other, including distances, 
angles, etc. A pharmacophore query is the rep¬ 
resentation of a pharmacophore in a format 
that can be used to search a chemical database 
for structures that can satisfy the pharmaco- 
hore and may elicit the desired response. 
Pharmacophore searching is usually con¬ 
ducted on a 3D structural database using 
search software that combines 2D searching 
with conformational analysis to find struc¬ 
tures that can, by rotating about single bonds, 
adopt a conformation that satisfies the phar¬ 
macophore. 

Pharmacophore Keys. Originally designed 
to speed pharmacophore query searching, 
pharmacophore keys are bitset fingerprints 
that indicate the presence or absence of given 

3- or 4-point pharmacophores in a structure 
stored in a database. The 3-point pharmacoph¬ 
ore keys represent triangular arrangements cf 
atoms and functional groups separated by 
given distances or distance ranges. The 

4- point pharmacophore keys represent tetra¬ 
hedral arrangements of atoms and functional 
groups. When a structure is registered into a 
3D database, a rapid conformational analysis 
is performed involving key single bonds in the 
structure. From the interatomic distance 
ranges between given atoms and functional 
groups in the structure, the various bits in the 
pharmacophore keys are set. These keys can 
be used as filters in pharmacophore searching, 
or increasingly, as filters before docking the 
structures into a known receptor (virtual 
screening). Pharmacophore keys have also 
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been used less commonly as descriptors in 
QSAR and data mining. 

Physicochemical Properties. Originally these 
were just measured properties like melting 
point, pKa, solubility, and octanol/water LogP. 
Increasingly, they are obtained from pro¬ 
grams that can calculate them from the 2D or 
3D structure. In QSAR, the classical triad of 
steric, electronic, and lipophilic properties is 
still widely used, but it has been enhanced to 
include enery-based descriptors, measures of 
binding interaction, 3D-QSAR multivariate 
descriptors (CoMFA), and others. Quantum 
mechanical calculations are being used in¬ 
creasingly to estimate physicochemical prop¬ 
erties. Once calculated, the properties are 
used to filter structures which may have unde¬ 
sirable ADMET criteria (e.g., the "rule of 
five"), or they may be used directly in models 
to estimate the type or level of biological activ¬ 
ity (QSAR). 

Pivoting Data. Changing data from row to 
column values or vice versa. This technique 
can be a very useful tool for summarizing data. 
One example of pivoting is to convert sub¬ 
structure keys that are stored by structure 
(with a bit turned on for each key the struc¬ 
ture contains), to storage by key (with one bit 
turned on for each structure that has the given 
key). Another example is to convert assay data 
that is stored by structure, to data that is 
stored by assay. In the process of pivoting 
data, it is common to consolidate values, for 
example, converting raw assay results to ED„ 
values, or taking the average of some physico¬ 
chemical property. 

Proteomics. The conversion of protein se¬ 
quence data into useful biological informa¬ 
tion. In general, the goal of proteomics is to 
characterize a gene product—i.e., protein—as 
to its structure, subcellular location, and func¬ 
tion. Additional information includes how a 
protein interacts ("networks") with other re¬ 
actions and cell processes. 

QSAR. Quantitative structure-activity re¬ 
lationships—the science of deriving quan¬ 
titative linear or nonlinear mathematical rela¬ 
tionships between physicochemical and topo¬ 
logical/topographical properties of chemical 
structures and their biological activity. Origi¬ 
nally, regression analysis was the only tool 
used to derive QSAR equations. More re¬ 


cently, tools such as partial least squares 
(PLS), neural networks, and a variety of data 
mining methods like decision trees and sup¬ 
port vector machines have come into use. 

Reacting Center. An atom in a reactant 
which is modified during the course of a reac¬ 
tion. Specifying reacting center information 
when searching for reactions can speed the 
search and reduce the number of incorrect 
hits. 

Reaction Scheme. A series of one-step reac¬ 
tions that lead from a given reactant to a given 
product, by way of intermediate steps. A reac¬ 
tion search system should be able to find reac¬ 
tant/product combinations that span several 
intermediate reactions. 

Refine a Search Query. The process of add¬ 
ing or modifying constraints of a search query 
to reduce or increase the number of hits. Con¬ 
straints may be added, removed, relaxed, or 
tightened to achieve the desired search re¬ 
sults. 

Registry Number. A unique identifier as¬ 
signed to a chemical structure or other piece of 
data when it is registered into a database. The 
registry number may be internal, primarily 
for use by the database search system, or ex¬ 
ternal, to be used by chemists and to link the 
data to other databases and files. 

Relational Database. A common database, 
architecture in which related data items are 
stored in separate tables, accessed by key 
fields, and indexed for rapid search and re¬ 
trieval. The dominant relational database sys¬ 
tems used in pharmaceutical discovery in¬ 
clude Oracle, Microsoft Access and SQL¬ 
Server, and IBM DB2. 

Result Set. A list of records resulting from a 
database search. The result set commonly con¬ 
sists of a list of record identifiers (sometimes 
called a cursor), which can be navigated to se¬ 
lect records. In some systems, a result set may 
also contain related data for each record. 

Retrosynthetic Analysis. An approach to 
computer-assisted synthesis design that starts 
with the products of a reaction or sequence of 
reactions and works backwards toward the re¬ 
actants. An example program that imple¬ 
ments retrosynthetic analysis is the LHASA 
program of E. J. Corey’s group. 

Rgroup. In a generic or Markush structure, 
generalized substituents or moieties are given 
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the representation R 1} R 2 , etc. These Rgroups 
represent collections of specific substituents 
or moieties (members) that can be replaced at 
the given position. 

Roll-up. The agglomeration, summariza¬ 
tion, or consolidation of data into a summary 
presentation. Roll-up often involves summa¬ 
rizing data at a given level in a data hierarchy. 
Examples would include the average of several 
ED„ values, or a simple yes/no indication that 
toxicity data for a given structure exists some¬ 
where in the database. 

Root Structure. The invariant portion of a 
Markush or generic structure. The attached 
Rgroups contain the substituents that vary 
from one specific structure to the next. Some¬ 
times termed a parent structure. 

ROSDAL. Linear notation scheme devised 
by the Beilstein Institute. It can contain just 
connection table information, or it may also 
contain atom coordinates. Several chemical 
information systems can convert ROSDAL 
strings to other structure file formats. 

Sgroup Data. In MDL structure storage, 
the attachment of structure-differentiating 
data directly to the structure. Such data may 
relate to the structure as a whole, or to atoms, 
bonds, fragments, or collections of atoms and 
bonds. Examples would include atomic partial 
charges on 3D models or percent composition 
attached to components of a formulation. 

Similarity Search. A type of "fuzzy" struc¬ 
ture searching in which molecules are com¬ 
pared with respect to the degree of overlap 
they share in terms of topological and/or phys¬ 
icochemical properties. Topological descrip¬ 
tors usually consist of substructure keys or 
fingerprints, in which case a similarity coeffi¬ 
cient like the Tanimoto coefficient is com¬ 
puted. In the case of calculated properties, a 
simple correlation coefficient may be used. 
The similarity coefficient used in a similarity 
search can also be used in various types of 
cluster analysis to group similar structures. 

SLN —Sybyl Line Notation. Linear notation 
used in conjunction with Tripos SPL (Sybyl 
Programming Language) to manipulate 
chemical structures. It is similar in syntax to 
SMILES notation. 

SMILES. Simplified Molecular Input Line 
Entry System—linear notation used in Day¬ 


light Chemical Information Systems software 
and widely supported by other systems. 

SQL —Structured Query Language. The 
standard query specification language for 
searching relational databases. Most database 
systems support the SQL standard but then 
add extensions particular to their implemen¬ 
tation. 

Star Schema. A standard data warehouse 
architecture, characterized by Ralph Kimball, 
in which a central "fact" table is connected to 
various "dimension" tables. 

Structural Data Mining. Application of data 
mining methodology to chemical structure 
and reaction databases. Currently in its in¬ 
fancy, it remains to be seen whether a "data 
snooping" approach to information and 
knowledge discovery can be as useful in drug 
discovery as it has proven to be in finance, 
marketing, and merchandising. 

Substance. In some structure databases, an 
entry that lacks a structure completely (a 
"nostructure"), is only partially character¬ 
ized, or is an unspecified mixture of known 
structures. Substances pose obvious problems 
in database searching. 

Substructure Search (SSS) Keys. Originally 
developed to facilitate substructure searching, 
these consist of a string of bits that represent a 
fingerprint of the structure with respect to ei¬ 
ther (1) a set of known and defined functional 
groups (e.g., MDL), or (2) a set of discovered 
atom-bond paths that the structure contains 
(e.g., Daylight). In MDL systems, the sub¬ 
structure keys are currently either 166 or 960 
bits in length. In Daylight systems, the sub¬ 
structure keys are of varying length and can 
be "folded" to achieve a higher density of bits 
turned on. Although SSS keys were originally 
developed to screen candidates for substruc¬ 
ture searching, they are currently used more 
for similarity calculations. 

Substructure Search. Application of “sub¬ 
graph isomorphism" search to chemical struc¬ 
tures. This consists of finding a particular ar¬ 
rangement of atoms and bonds as they are 
embedded in a chemical structure. The ar¬ 
rangement being searched for is termed the 
query substructure, the structures being 
searched are termed the candidates, and any 
particular structure in that set is termed a 
target structure. If the query substructure is 
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found in the target, the target structure is 
added to the result set (or hit list). In display¬ 
ing the results of the search, the atoms and 
bonds in the substructure as mapped onto the 
hit may be highlighted (shown darkened or in 
a different color) in the structure display. In 
general, more than one occurrence of a sub¬ 
structure may be found in a given structure, 
and substructure mappings may be overlap¬ 
ping or non-overlapping. 

Superstructure Search. Modification of sub¬ 
structure search in which the substructure 
query becomes the target structure, and the 
target structure in the database becomes the 
substructure search query. The search finds 
structures in the database that are substruc¬ 
tures of the query. A similar extension to 
structure similarity searching yields super¬ 
structure similarity searching. 

Supervised Data Mining. Searching large 
volumes of data for hidden predictive relation¬ 
ships. Supervised analysis requires one or 
more "dependent" or response variables, to be 
predicted from a set of "independent" or pre¬ 
dictor variables. The techniques used include 
various classification methods (decision tree, 
support vector, Bayesian) and various estima¬ 
tion methods (regression, neural nets). 

Tanimoto Coefficient. Standard coefficient 
for computing the similarity of chemical struc¬ 
tures. If structure A has 20 bits turned on in a 
fingerprint, and structure B has 30 bits turned 
on, and the two structures have 10 bits in com¬ 
mon, the Tanimoto coefficient is 10/(20 + 30 - 
2 X 10) or 0.33. Its value can range from 0 (no 
similarity) to 1.0 (perfect match). Other simi¬ 
larity coefficients are also used, and in some 
systems (such as MDL) the various bits are 
weighted inversely according to their occur¬ 
rence in the database, so that very common 
substructures do not contribute much to the 
similarity. 

Terabyte. One thousand gigabytes, or 10 12 
bytes of data. The largest relational databases 
cf any kind are currently a few tens of ter¬ 
abytes in size. At present, the largest data¬ 
bases of chemical and biological information 
are gigabytes (10 9 bytes) in size. 

Thick or Thin Client. A thick client architec¬ 
ture is one in which a significant amount of 
computing is done on the user's workstation. 
This is appropriate for user-interface-inten¬ 


sive activity like graphics and calculations on 
individual molecules. The alternative thin-cli- 
ent architecture either does not require much 
local computing, or it uses a built-in resource 
like an Internet browser as a client. 

Toolkit. A collection of computer routines 
that each perform one or a small number of 
information management tasks. The routines 
are provided as a library and they can be in¬ 
corporated into custom user-written applica¬ 
tion programs to carry out tasks that ordinary 
application programs may not perform. The 
interface between the toolkit routines and the 
user-written program is referred to as the Ap¬ 
plication Programming Interface, or API. 

Topographical. Structure data that is 
based on the connection table and the 3D 
structure of a molecule. Examples include sur¬ 
face area and volume and pharmacophore dis¬ 
tances between atoms. 

Topological. Structure data that is based 
only on the connection table of the structure, 
without regard to 2D or 3D coordinates of the 
atoms. Examples include molecular weight 
and formula, counts of substructures, and in¬ 
dices like molecular connectivity. 

Tree. A data structure that is widely used 
in chemical information storage. Commonly 
viewed with the root of the tree at the top (or 
to the left), successive levels of branching lead 
to the "nodes" of the tree and ultimately to its 
"leaves" (terminal nodes). Depending on how 
they split at a node, trees may be binary or 
n-nary, and depending on how their nodes are 
distributed, they may be balanced or unbal¬ 
anced. A tree is usually traversed from the 
root to the leaves, and this traversal can be 
depth-first (followinga single path until a leaf 
node is reached), or breadth-first (looking at 
all the nodes at a given level). An example of a 
tree data structure is the fastsearch index 
used in MDL substructure searching. 

Unicode. A 32-bit successor to the ASCII 
character set. With Unicode, foreign alphabets 
and special characters can be encoded. 

Unix. Widely used operating system for 
workstations and server computers. Various 
computer vendors supply their version of 
Unix, which typically descends from either the 
Bell Labs or Berkeley versions. A microcom¬ 
puter version of Unix is Linux, which is rap¬ 
idly growing in acceptance. 



412 


Chemical Information Computing Systems in Drug Discovery 


Unsupervised Data Mining. Searching large 
volumes of data for hidden descriptive rela¬ 
tionships. Unlike supervised data mining, no 
response variables are used. The techniques 
used include various display and data reduc¬ 
tion methods, as well as cluster analysis and 
association analysis. 

VARCHAR, VARCHAR2. SQL data types 
used to store character data in a relational da¬ 
tabase system. Storage is limited to about 
4000 characters, so larger pieces of data must 
be stored as CLOB data. 

Virtual Screening. Using computer model¬ 
ing to screen leads for activity. The screening 
may be through some QSAR or data mining 
model, which typically requires only 2D struc¬ 
tures and data, or it may involve 3D molecular 
modeling and docking with a known or puta¬ 
tive receptor. The speed and increasing accu¬ 
racy of virtual screening make it a vital step in 
the drug discovery process. 

XML. Extensible Markup Language—a 
widely used standard for producing self-docu¬ 
menting text. Documents that subscribe to the 
XML standard can be freely exchanged over 
networks and between applications, using 
standard parsing programs to interpret the 
document. CML is an extension of XML that 
can be used to transport structures, reactions, 
and chemical and biological data. A query lan¬ 
guage, XMLQuery, is being developed to allow 
searching of XML documents in a manner sim¬ 
ilar to the use of SQL to search relational da¬ 
tabases. 
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1 INTRODUCTION 

Structure-based drug design by use of struc¬ 
tural biology remains one of the most logical 
and aesthetically pleasing approaches in drug 
discovery paradigms. The first paper on the 
potential use of crystallography in medicinal 
chemistry was written in 1974 (l)and was pre¬ 
sented at Professor Alfred Burger's retire¬ 
ment symposium in 1972. The excerpted last 
paragraph in the paper, reproduced below, 
foresaw the integration of X-ray crystallogra¬ 
phy into the field of medicinal chemistry. 

It is reasonable to assume then that the future of 
large molecule crystallography in medical chem¬ 
istry may well be of monumental proportions. 
The reactivity of the receptor certainly lies in the 
nature of the environment and position of vari¬ 
ous amino acid residues. When the structured 
knowledge of the binding capabilities of the ac¬ 
tive site residues to specify groups on the agonist 
or antagonists becomes known, it should lead to 
proposals for syntheses of very specific agents 
with a high probability of biological action. Com¬ 
bined with what is known about transyort of 
drugs through a Hansch-type analysis, etc., it is 
feasible that the drugs cf the future will be tailor- 
made in this fashion. Certainly, and unfortu¬ 
nately, however, this day is not as close as one 
would like. The X-ray technique for large mole¬ 
cules, crystallization techniques, isolation tech¬ 
niques cf biological systems, mechanism studies 
of active sites and synthetic talents have not 
been extensively intertwined because cf the ex¬ 
isting barriers (1). 

Since that time there have been numerous 
successes in advancing new agents into clini¬ 
cal trials by combining crystallography with 
associated fields in drug discovery. Currently, 
more structures are solved every year than 
were in the entire Protein Data Bank in 1972. 
Although almost every major pharmaceutical 
company has an X-ray diffraction group, Ag- 
ouron (now Pfizer) was the first biotechnology 
startup company to make drug discovery 
based on X-ray crystallography a central and 
primary theme (2). Other startup companies 


2.8.1 Mitogen-Activated Protein Kinase 
p38a, 456 

2.8.2 Purine Nucleoside Phosphorylase, 459 
2.9 Conclusions and Lessons Learned, 461 


(such as BioCryst and Vertex) were soon 
founded to apply similar approaches. More re¬ 
cent companies, such as Structural Genomix 
(3) and Astex (4), and the High Throughput 
Crystallography Consortium, organized by 
Accelrys (5), have emerged to carry on struc¬ 
ture-based drug discovery in a high through¬ 
put environment. Third-generation synchro¬ 
tron sources, such as the Advanced Photon 
Source (APS) at Argonne National Laboratory 
outside Chicago, and new detectors, have 
enormously increased the speed of data collec¬ 
tion. It is now possible to collect high resolu¬ 
tion data from protein crystals, solve, and re¬ 
fine the structure in days to a few months. 
This information is covered in an adjacent 
chapter. Simultaneous advances in computing 
have added to the speed of obtaining three- 
dimensional structural information on inter¬ 
esting drug design targets. These develop¬ 
ments, coupled with the sequencing of the 
human genome and the advent of bioinformat¬ 
ics, provide workers in structure-based drug 
design with a plethora of opportunities for 
success. 

The utility of any drug discovery tool is 
measured, in the final analysis, by the output 
of the tool's use. New tools are burdened with 
unrealistically high expectations. As their ap¬ 
plication begins, the impact is sometimes 
more limited than was originally envisaged. 
Structure-based design methods have had 
great utility for the design of enzyme inhibi¬ 
tors, tight-binding receptor ligands, and novel 
proteins. The utility of these methods for the 
design of drugs is somewhat more limited, 
simply because there are so many factors that 
must be balanced in the successful design of a 
drug. Nonetheless, structure-based drug de¬ 
sign (SBDD), distinct from the (far easier) 
structure-based inhibitor design, is now a re¬ 
ality and has had significant impact. Aspects 
of the methods and utility of SBDD have been 
described in several excellent review articles 
and monographs (6-12). This chapter focuses 
on the utility of SBDD in the cases of drugs 
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that have been launched as products, or that 
have at least entered human clinical trials. In 
some cases, SBDD has been a remarkable suc¬ 
cess. In others, it has failed in the sense that, 
despite its use, the candidate produced did not 
gain approval to become a marketed drug. In 
the latter cases, this was usually not truly a 
failure of SBDD, but rather attributed to the 
complex criteria that drugs must meet, and to 
the complex regulatory hurdles that candi¬ 
dates and companies face. 

In addition to providing a measure of the 
impact of SBDD on the creation of actual 
drugs, these examples will also provide lessons 
about how to apply SBDD in drug discovery. 
The chapter is not completely encyclopedic, 
and some significant instances of SBDD will 
be missed by the informed reader. However, 
the discovery programs with drugs and drug 
candidates that are discussed will provide suf¬ 
ficient diversity that general trends can 
emerge. In a few cases, relevant results for 
preclinical compounds that seem likely to en¬ 
ter human trials are described. A growing 
number of the drugs to which structural de¬ 
sign methods are applied are themselves pro¬ 
teins (e.g., cytokines, immunomodulators, 
monoclonal antibodies). However, this chap¬ 
ter is restricted to small organic molecules 
that are designed by use of the three-dimen¬ 
sional structure of a target protein. 

2 STRUCTURE-BASED DRUG DESIGN 

2.1 Theory and Methods 

The concept of structure-based drug discovery 
combines information from several fields: X- 
ray crystallography and/or NMR, molecular 
modeling, synthetic organic chemistry, quali¬ 
tative structure-activity relationships (QSAR), 
and biological evaluation. Figure 10.1 repre¬ 
sents a general road map where a cyclic pro¬ 
cess refines each stage of discovery. Initial 
binding site information is gained from the 
three-dimensional solution of a complex of the 
target with a lead compound(s). Molecular 
modeling is usually next applied with the in¬ 
tent of designing a more specific ligand(s) with 
higher affinity. Synthesis and subsequent in 
vitro biological evaluation of the new agents 
produces more candidates for crystallographic 
or NMR analysis, with the hope of correlating 


the biological action with precise structural 
information. It makes good sense at the early 
stages of design to use lead molecule struc¬ 
tural scaffolds that retain low toxicity profiles, 
given that the latter most often derails suc¬ 
cessful drug discovery. The most active deriv- 
ative(~Jfom this cyclic process can be for¬ 
warded for in vivo evaluation in animals. 

2.2 Hemoglobin, One of the First 
Drug-Design Targets 

2.2.1 History. Perutz and colleagues de¬ 
termined the first three-dimensional structures 
of proteins. Through use of X-ray crystallogra¬ 
phy Kendrew determined the structure of myo¬ 
globin (13), whereas Perutz determined the 
structure of hemoglobin (Hb) (14-16). At the 
present time, new protein and nucleic acid 
structures and complexes are published 
weekly. However, for a long period after the 
first protein structures were solved, progress 
was slower. Hb was of interest for drug discov¬ 
ery purposes because of the early identifica¬ 
tion of the mutant 6 Glu —» Val, which causes 
sickle-cell anemia. The crystal structure of 
sickle Hb (Hbs) was published by Wishner et 
al. (17) and was later solved at a higher reso¬ 
lution by Harrington et al. (18). 

2.2.2 Sickle-Cell Anemia. In 1975, through 
use of the three-dimensional Hb coordinates, 
two groups initiated SBDD studies to discover 
an agent to treat sickle-cell anemia: Goodford 
et al. in England and Abraham et al. in the 
United States. Goodford's group was the first 
to develop an antisickling agent (BW12C), 
based on structure-based drug design, which 
reached clinical trials (19, 20). However, 
Wireko and coworkers were unable to confirm 
the BW12C binding site proposed by Goodford 

(21) . The second antisickling agent proposed 
by Abraham et al. to advance to clinical trials 
was the food additive vanillin (compound la) 

(22) . The crystallographic binding site of 
BW12C (lb )was found to be at the N-terminal 
amino groups of the a-chains (21), whereas 
that of vanillin shows binding close to aHisl03 
and also at a minor site between )3Hisll7 and 
]8Hisll7 (22). A recently redetermined bind¬ 
ing site of vanillin at a higher resolution shows 
weak binding to the N-terminal amino group 
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Figure 10.1. Schematic of the structure-based drug discovery/design process. The figure maps out 
the iterative steps that make use of X-ray crystallography, molecular modeling, organic synthesis, 
and biological testing to identify and optimize ligand-protein interactions. 
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of the a-chain (23). A derivative of vanillin has 
been patented and is a candidate for clinical 
trials. 

Two marketed medicines, ethacrynic acid 



(lb) BW12C 


(2), a diuretic agent, and clofibric acid (3), an 
antilipidemic agent, were reported to have 
strong antigelling activity (24, 25), and 
through X-ray analyses of cocrystals, the bind¬ 
ing sites of these agents to Hb were elucidated 
(26). Unfortunately, it was found that high 
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through X-ray analyses of cocrystals, the bind¬ 
ing sites of these agents to Hb were elucidated 
(26). Unfortunately, it was found that higlh 





2 Structure-Based Drug Design 


421 


Cl 0 



(2) ethacrynic acid 



(3) clofibric acid 


concentrations of ethacrynic acid were needed 
to interact with Hb in deformed red cells (27). 
Clofibric acid, when administered in a 2 gm/ 
day dose (as the ethyl ester clofibrate), ap¬ 
peared to be an ideal potential treatment for 
sickle-cell anemia, but was subsequently 
found to be highly bound to serum proteins 
and not transported in quantities sufficient to 
interact with sickle Hb. Furthermore, struc¬ 
ture-based derivatives were not found to be 
effective (28, 29). 

The major problem with designing a small 
molecule to treat sickle cell anemia is not so 
much an issue of specificity, but arises from 
the treatment of a chronic disease. The poten¬ 
tial cumulative toxicity from the amount of 
drug needed to interact with approximately 
two pounds of hemoglobin S over a homozy¬ 
gous patient's lifetime is the major concern 
(22) (for a review, see Vol. 3, Chapter 10. 
Sickle Cell Anemia, by Alan Schecter et al). 

2.2! .3 Allosteric Effectors. 2,3-Diphospho- 
glycerate (2,3-DPG, compound 4), found in 
most mammalian red cells, is the naturally oc¬ 
curring allosteric effector for Hb. Its physio- 



(4) 2,3-DPG 


logical role is to right shift the Hb oxygen¬ 
binding curve to release more oxygen. The 
binding site of 2,3-DPG, determined by Ar- 
none (30) lies on the dyad axis at the mouth of 
the /3-cleft (Fig. 10.2) interacting with the N- 
terminal jSVall, /3Lys82, and /3Hisl43 of deoxy 
Hb. A more recent study at a higher resolu¬ 
tion, by Richard et al. (31), found DPG to in¬ 
teract with the residues /3His2 and )3Lys82. 
Goodford and colleagues were the first to de¬ 
sign agents that would bind to the 2,3-DPG 
site (32-34). An effective allosteric effector 
that can enter red cells might be used to treat 
hypoxic diseases such as angina and stroke, to 
enhance radiation treatment of hypoxic tu¬ 
mors, or to extend the shelf life of stored blood. 

Many antigelling agents left shift the oxy¬ 
gen binding curve, producing higher concen¬ 
trations of oxy-HbS. Given to patients with 
sickle-cell anemia, this should result in less 
polymerization, and therefore less red blood 
cell sickling. It was a surprise therefore when 
clofibric acid, which blocks sickle-cell Hb poly¬ 
merization, was found to shift the Hb oxygen 
binding curve to the right, in a manner similar 
to that of 2,3-DPG (25). The clofibric acid 
binding site was found to be far removed from 
the 2,3-DPG site (25, 35). The determination 
of the clofibric acid binding site on Hb was the 
first report of a tense state (deoxy state) allo- 



Figure 10.2. View of (4) (2,3-DPG) binding site at 
the mouth of the /3-cleft of deoxy hemoglobin. See 
color insert. 
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steric binding site different from that of 2,3- 
DPG (compound4). Perutz and Poyart tested 
another antilipidemic agent, bezafibrate (com¬ 
pound 5),and found that it was an even more 



potent right-shifting agent than clofibrate 

(36) . Perutz et al. (26) and Abraham (35) de¬ 
termined the binding site of bezafibrate and 
found it to link a high occupancy clofibrate site 
with a low occupancy site. Lalezari and Lalez- 
ari synthesized urea derivatives of bezafibrate 

(37) , and with Perutz et al. determined the 
binding site of the most potent derivatives 

(38) . Although these compounds were ex¬ 
tremely potent, they were hampered by serum 
albumin binding (39, 40). 

Abraham and coworkers synthesized a se¬ 
ries of bezafibrate analogs (39-42). One of 
these agents, efaproxaril (RSR 13, compound 
6a) is currently in Phase III clinical trials for 
radiation treatment of metastatic brain tu- 



(6a) (RSR- 13) R = 3,5-dimethyl 
(6b) (RSR- 56) R = 3,5-dimethoxy 


solely related to their binding constant, pro¬ 
viding a structural basis for E. J. Ariens’ the¬ 
ory of intrinsic activity (42). 

By use of X-ray crystallographic analyses, 
the key elements linking allosteric potency 
with structure were uncovered. In addition, 
the computational program HINT, which 
quantitates atom-atom interactions, was used 
to determine the strongest contacts between 
various bezafibrate analogs and Hb residues. 
These analyses revealed that the amide link¬ 
age between the two aromatic rings of the 
compounds must be orientated so that the car¬ 
bonyl oxygen forms a hydrogen bond with the 
side-chain amine of «Lys99 (41, 43). Three 
other important interactions were found. The 
first are the water-mediated hydrogen bonds 
between the effector molecule and the protein, 
the most important occurring between the ef¬ 
fector's terminal carboxylate and the side- 
chain guanidinium moiety of residue oArgl41. 
Second, a hydrophobic interaction involves a 
methyl or halogen substituent on the effec¬ 
tor's terminal aromatic ring and a hydropho¬ 
bic groove created by Hb residues aPhe36, 
aLys99, aLeulOO, aHisl03, and /3Asnl08. 
Third, a hydrogen bond is formed between the 
side-chain amide nitrogen of Asnl08 and the 
electron cloud of the effector's terminal aro¬ 
matic ring (40,41,43). Abraham first observed 
this last interaction while elucidating the Hb 
binding site of bezafibrate (36). Burley and 
Petsko had previously pointed out this type of 
hydrogen bond in a number of proteins, indi¬ 
cating that this contact is involved in a num¬ 
ber of other receptor interactions (44, 45). Pe¬ 
rutz and Levitte estimated this bond to be 
about 3 kcal/mol (46). Figure 10.3 shows the 
overlap of four allosteric effectors (6a, 6b, 7a 
and 7b) that bind at the same site in deoxy Hb 
but differ in their allosteric potency. 


mors (see, Vol. 4, Chapter 4. Radiosensitizers 
and Radioprotective Agents, by Edward Bump 
et al). The binding constants and binding sites 
of a large number of these bezafibrate analogs 
were measured and agreed with the number of 
crystallographic binding sites found (42). The 
degree of right shift in the oxygen-binding 
curve produced by these compounds was not 



(7a) (MM-30) R = 3,5-dichloro 
(7b) (MM-25) R = 4-chloro 
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Figure 10.3. Stereoview of allosteric binding site in deoxy hemoglobin. A similar compound envi¬ 
ronment is observed at the symmetry-related site, not shown here, (a) Overlap of four right-shifting 
allosteric effectors of hemoglobin: (6a) (RSR13, yellow), (6b) (RSR56, black), (7a) (MM30, red), and 
7b) (MM25, cyan). The four effectors bind at the same site in deoxy hemoglobin. The stronger acting 
RSR compounds differ from the much weaker MM compounds by reversal of the amide bond located 
Detween the two phenyl rings. As a result, in both RSR13 and RSR56, the carbonyl oxygen faces and 
nakes a key hydrogen bonding interaction with the amine of ctLys99. In contrast, the carbonyl 
Dxygen of the MM compounds is oriented away from the aLys99 amine. The aLys99 interaction with 
;he RSR compounds appears to be critical in the allosteric differences, (b) Detailed interactions 
between RSR13 (6a) and hemoglobin, showing key hydrogen bonding interactions that help con¬ 
strain the T-state and explain the allosteric nature of this compound and those of other related 
compounds. See color insert. 
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Figure 10.5. Stereoview of the binding site for (9) (n = 3, TB36, yellow) in deoxy Hb. A similar 
compound environment is observed at the symmetry-related site, not shown here. One aldehyde is 
co valently attached to the N-terminal alVall, whereas the second aldehyde is bound to the opposite 
subunit, a2Lys99 ammonium ion. The carboxylate on the first aromatic ring forms a bidentate 
hydrogen bond and salt bridge with the guanidinium ion of «2Argl41 cf the opposite subunit. The 
effector thus ties two subunits together and adds additional constraints to the T-state, resulting in a 
shift in the Hb allosteric equilibrium to the right. The magnitude of constraint placed on the T-state 
by the crosslinked aLys99 varies with the flexibility of the linker. Shorter bridging chains form 
tighter crosslinks and yield larger shifts in the allosteric equilibrium. See color insert. 


binding curve, are generally consistent with 
the behavior of the allosteric effectors and 
crosslinking agents. 

2.3 Antifolate Targets 

2.3.1 Dihydrofolate Reductase. The re¬ 
duced form of folate (tetrahydrofolate) acts as 
a one-carbon donor in a wide variety of biosyn¬ 
thetic transformations. This includes essen¬ 
tial steps in the synthesis of purine nucleo¬ 
tides and of thymidylate, essential precursors 
to ENfA and RNA. For this reason, folate-de- 
pendent enzymes have been useful targets for 
the development of anticancer and anti-in¬ 
flammatory drugs (e.g., methotrexate) and 
anti-infectives (trimethoprim, pyrimethamine). 
During the reaction catalyzed by thymidylate 
synthase (TS), tetrahydrofolate also acts as a 
reducltant and is converted stoichiometric ally 
to dihydrofolate. The regeneration of tetrahy¬ 
drofolate, required for the continuous func¬ 
tioning of this cofactor, is catalyzed by dihy¬ 
drofolate reductase (DHFR). 



The first crystal structure of a drug bound 
to its molecular target was provided by the 
pioneering X-ray diffraction study of the com¬ 
plex between DHFR and methotrexate (57), 
albeit in this case the target was a bacterial 
surrogate for the actual target (the human en¬ 
zyme). Once X-ray structures of DHFR from 
eukaryotic sources were also solved, compari¬ 
sons of the bacterial and eukaryotic DHFR 
structures revealed the structural basis for 
the selectivity of the antibacterial drug tri¬ 
methoprim for the bacterial enzyme. This un¬ 
derstanding allowed Goodford and colleagues 
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to rationally design trimethoprim analogs 
with altered potencies (58). Retrospective 
studies such as those done by David Matthews 
and others on DHFR (see, for example, Ref. 
59) set the stage for the iterative process of 
structure-based inhibitor design as it was 
later developed at Agouron Pharmaceuticals, 
targeted against another folate-dependent en¬ 
zyme, TS (60,61). 

2.3.2 Thymidylate Synthase. There are two 
main modes in which structure-based meth¬ 
ods for inhibitor design have been employed. 
The first mode is structure-guided optimiza¬ 
tion of the design of a previously known chem¬ 
ical scaffold. The scaffold could be a known 
drug or inhibitor, substrate analog, or a hit 
from screening of a random library. The prop¬ 
erty, which is modified during the optimiza¬ 
tion, may be, for example, potency, solubility, 
or target selectivity, or the more challenging 
aim may be to optimize several properties si¬ 
multaneously. A second and potentially more 
powerful mode is for the de nouo design of in¬ 
hibitory ligands, sometimes referred to as lead 
generation. This mode relies strictly on the 
structure of the target enzyme or receptor as a 
template. A substrate or an inhibitor may be 
bound to the crystalline target, and deleted to 
provide the template. This is advantageous 
when, as in the case of TS, a substantial con¬ 
formational change occurs when ligands bind. 
After a de nouo design process has provided a 
new inhibitor that is structurally unique, the 
properties of the new scaffold can be optimized 
by continued structural guidance. Both modes 
of SBDD have been used to generate TS inhib¬ 
itors that have entered clinical trials. 


When the design of inhibitors of human TS 
at Agouron Pharmaceuticals began, the 
amounts of the human enzyme required for 
crystallographic study were unavailable. Be¬ 
cause the active site of the enzyme is so highly 
conserved, it was assumed that an acceptable 
surrogate for human TS would be the crystal 
structure of a bacterial TS (60, 62). Figure 
10.6 shows the conformation of the quinazo- 
line folate analog 10 (NlO-propynyl-5, 
8-dideazafolate), bound within the active site 
of the Escherichia coli enzyme with the nucle¬ 
otide substrate, 2'-deoxyuridine-5'-mono¬ 
phosphate (63, 64). This folate analog, de¬ 
signed by classical medicinal chemistry as an 
analog of the TS substrate, 5,10-methylene- 
tetrahydrofolate (11), is a potent TS inhibitor. 
Nevertheless, (10)failed as an anticancer drug 
because of its insolubility and resulting neph¬ 
rotoxicity (65). 

2.3.2.1 Structure-Guided Optimization: AG85 
and AG337. In the crystalline complex with E. 
coli TS, the quinazoline ring of compound (10) 
binds on top of the pyrimidine of the nucleo¬ 
tide, in a protein crevice surrounded by hydro- 
phobic residues (Fig. 10.6). The bound mole¬ 
cule bends at right angles between the 
quinazoline and 4-aminobenzoyl rings (at 
N10), with the D-glutamate portion extending 
out to the surface of the enzyme. Hydrogen 
bonds are made with several enzyme side- 
chains, the terminal carboxylate, and several 
tightly bound waters. This compound, like fo¬ 
late and most folate analogs, gains entry into 
cells through a transport system that recog¬ 
nizes its D-glutamate moiety, and intracellular 
concentrations are elevated because of trap- 



(10) N10-propynyl-5,8-dideazafolate (also known as PDDF or CB3717) 
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(11)5 ,10-methylene, 5,6,7,8-tetrahydrofolate 


ping cf'the compound as highly charged forms 
after addition of several additional glutamates 
by a cellular enzyme. 

TS inhibitors were designed by Agouron 
scientists with the aim of providing a drug 
that could enter cells passively and thus avoid 
the need for transport or polyglutamylation. 
The first were designed by structure-guided 
modification of known antifolates, and others 
were designed de novo. Starting with (12), the 
glutamlate moiety was deleted from the struc¬ 
ture. [Compound (12), the 2-desamino-2- 
methyl analog of ( 10 ), had been found to be 
much more water soluble than (10). This 
eventually led (65) to AstraZeneca’s Tomudex, 


which is now approved for treatment of colo¬ 
rectal cancer in European markets.] Removal 
of the glutamate reduced the potency by 2 to 3 
orders of magnitude (Table 10.1, 12 versus 
13). The crystal structure solved by use of (10) 
indicated potential interactions that were ex¬ 
ploited by substituents such as the m-CF, in 
compound (14).The phenyl moiety in (15)was 
added to interact with Phel76 and Ile79 (Fig. 
10.6). Combining substituents does not neces¬ 
sarily produce the expected sum of binding 
free energy (compare 16 with 14 and 15). 
Structures of the complexes with several of 
these compounds revealed that ideal place- . 
ment of one group does not always accommo- 



Figure 10.6. Binding site for (lO)(lV.Z0-propynyl-5,8-dideazafolate), within the active site of thymi- 
dylate synthase from Escherichia coli. The surface of the inhibitor is shown in the left view. The red 
spheres in the left view are tightly bound water molecules. See color insert. 
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Table 10.1 SAR for 2-Methyl-4-oxo-quinazoline Inhibitors of TS a 
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active site cavity, toward bulk solvent, re¬ 
sulted in (20). The use of an amine for the 
groups attached to position 6 of the benz- 



( 20 ) 

[cd] indole improved the synthetic ease for 
variation of these groups. Compound (20) had 
a K is value of 3 pM for inhibition of human TS 
and was about 10-fold less potent against the 
bacterial enzyme. 

The X-ray structure of (20) bound to E. coli 
TS revealed that the compound actually binds 
more deeply into the active-site crevice than 
had been anticipated. Instead of interacting 
favorably with the enzyme-bound water indi¬ 
cated in Fig. 10.7, the oxygen at position 2 of 
the benz[cd]indole displaces it. This forced the 
Ala263 carbonyl oxygen to move by about 1 A. 
Replacement of the oxygen at position 2 with 
nitrogen provided a significant increase in in¬ 
hibitory potency. Structural studies revealed 
that this also resulted in recovery of the dis¬ 
placed water, and restoration of the original 
position of the Ala263 carbonyl oxygen. The 
substituents at position 5, on the tertiary 
amine nitrogen, and on the sulfonyl group 
were also varied during the iterative optimiza¬ 
tion process. The process yielded (21) 
(AG331), which has a K is value of 12 n M for 
inhibition of human TS. Compound (21) en¬ 
tered clinical trials as an antitumor agent (71). 

2.3.3 Clycinamide Ribonucleotide Formyl- 
transferase. Glycinamide ribonucleotide formyl- 
transferase (GARFT) catalyzes the N-formyla- 



tion of glycinamide ribonucleotide, through 
use of A-10-formyltetrahydrofolate as the 
one-carbon donor. Because this is an essential 
step in the synthesis of purine nucleotides, 
GARFT is a target for blocking the prolifera¬ 
tion of malignant cells. Several potent GARFT 
inhibitors, such as pemetrexed (22, ALIMTA, 


H 2 N 


O 




LY231514) and lometrexol (23, 5,10-dideaz- 
tetrahydrofolate, LY-264618), have been 
shown to be effective antitumor agents in clin¬ 
ical trials (71, 72). 

These were designed through traditional 
medicinal chemistry approaches, in which an- 
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of (23), including some GARFT inhibitors in 
which the ring containing N5 was opened (80). 
Inspection of the structure of the bacterial 
GARFT-inhibitor complex revealed several 
important features. The pyrimidine portion of 
the pteridine was fully buried within the 
GARFT active site, forming many hydrogen 
bonds with conserved enzymic groups. The d- 
glutamate moiety was largely solvent exposed, 
with no immediately obvious potential for 
building additional interactions. Retention of 
the D-glutamate unmodified was also desirable 
for pharmacodynamic reasons. A significant 
opportunity was presented by the fact that the 
active site might accommodate a bulkier hy¬ 
drophobic atom than the methylene group in 
5-deazatetrahydrofolate that replaces the nat¬ 
urally occurring N5 in tetrahydrofolate. To 
test this idea, a series of 5-thiapyrimidinones 
were synthesized, including compound (24). 
These analogs were more readily prepared 
than the corresponding cyclic derivatives. 
This compound had a potency of 30-40 n M in 
both a cell-based antiproliferation assay and a 
biochemical assay for human GARFT inhibi¬ 
tion. A crystal structure of human GARFT, 
complexed with (24) and glycinamide ribonu¬ 
cleotide, confirmed the structural homology 
between E. coli and human enzymes. 

Compounds with one fewer methylene in 
the linker connecting the thiophenyl moiety to 



the 5-thia position were much less active. Sev¬ 
eral other analogs, such as (25), were made in 
attempts to fill the active site more fully, and 
to restrict the conformational flexibility of the 
linker. Molecular mechanics calculations 
failed to correctly predict the conformation on 
the 5-thiamethylene group of (25) bound to 
GARFT because of unforeseen conformational 
flexibility of the enzyme revealed by an X-ray 
structure of this complex. This again empha¬ 
sizes the importance of interative experimen¬ 
tal confirmation of molecular designs. Several 
functional criteria in addition to GARFT inhi¬ 
bition and cell-based assays were evaluated 
during the several cycles of optimization. 
These included the ability of exogenous purine 
to rescue cells (which indicates selective 
GARFT inhibition), and the ability of the in¬ 
hibitors to function as substrates for enzymes 
involved in the transport and cellular accumu¬ 
lation of antifolate drugs. Balancing these cri¬ 
teria has resulted in the choice of compounds 
(26) and (27) (AG2034 and AG2037, respec¬ 
tively) for clinical development at Pfizer. (In 
1999, Agouron Pharmaceuticals was acquired 
by Warner-Lambert, which was subsequently 
acquired by Pfizer.) It is as yet unclear 
whether the considerable toxicity of these and 
other GARFT inhibitors will allow these com¬ 
pounds to be acceptable as anticancer drugs. 



( 25 ) 
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(26) X = H 

(27) X = methyl 


2.4 Proteases 

2.4.1 Angiotensin-Converting Enzyme and 
the Discovery of Captopril. The design of cap- 
topril was a landmark in the application of 
structural models for developing enzyme in¬ 
hibitors (81,82). This discovery rapidly led to 
the development of a family of therapeutically 
useful inhibitors of angiotensin-converting 
enzyme for the treatment of hypertension 
(83). The story has been reviewed thoroughly 
(for a historical perspective, see either Ref. 84 
or Ref. 85), and is briefly summarized here. 
Angiotensin II, a circulating peptide with po¬ 
tent vasoconstriction activity, is generated by 
the C-terminal hydrolytic cleavage of a dipep¬ 
tide from angiotensin I, catalyzed by angioten¬ 
sin-converting enzyme. Therefore, inhibitors 
of angiotensin-converting enzyme are vasodi¬ 
lators. [An important aside: Angiotensin I is 
generated from a precursor by the action of 
renin, another exopeptidase that is an aspar- 
tyl protease. An orally available renin inhibi¬ 
tor remains an elusive goal, although there are 
still efforts under way that use SBDD methods 
(86). Renin inhibitors were early tools in the 
study of the essential aspartyl protease of hu¬ 
man immunodeficiency virus (HIV), which is 
discussed later.] 


10.8). This model was based on the already 
known X-ray structure of bovine pancreatic 
carboxypeptidase A. Both enzymes are C-ter¬ 
minal exopeptidases that require zinc ion for 
activity, but differ in that carboxypeptidase A 
releases an amino acid, rather than a dipep¬ 
tide. Hence, the binding site for the angioten¬ 
sin-converting enzyme was postulated to be 
longer, and to contain groups to interact with 
the central peptide linkage. The suggestion 
had been made (87) that the inhibition of car¬ 
boxypeptidase A by benzylsuccinate could be 
explained by viewingbenzylsuccinate as a "by¬ 
product analog" (Fig. 10.8, top). The hypothe¬ 
sis was that one of the carboxylates bound into 
a cationic site, whereas the other interacted 
with the active site zinc. If this were true, then 
a similar model for angiotensin-converting en¬ 
zyme predicted that slightly longer diacids, de¬ 
signed with some regard for the sequence pref¬ 
erences of the converting enzyme, should 
inhibit that enzyme. This hypothesis was 
quickly confirmed by the inhibitory activity cf 
succiny 1-proline (28a). 

Peptide sequences related to those of snake 
venom peptides had already been used to de¬ 
fine the structural requirements for peptide 
inhibitors of angiotensin-converting enzyme. 
Peptides are unstable in vivo and poorly ab- 


Asp-Arg-Val-Tyr-1 le-Hi s-P ro-Phe-Hi s-Levr^ 

Angiotensin I 


A key tool in the discovery of captopril at 
Squibb was the use of a model for the active 
site of angiotensin-converting enzyme (Fig. 


Asp-Arg-Val-Tyr-Ile-His-Pro-Phe + His-Leu 
Angiotensin II 


sorbed intestinally, and thus are not good drug 
candidates. However, the best peptide inhibi¬ 
tor was 500-fold more potent than (28a). The 
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benzylsuccinate 


substrate cleavage 


--N 

H 




O 


Figure 10.8. Active site models for car- 
boxypeptidase A (top) and angiotensin¬ 
converting enzyme (bottom). The design 
of the dipeptidyl derivative that led to the 
discovery of captopril is shown bound to 
the latter enzyme. 


information provided by the peptides, the 
structural model for the active site of angio¬ 
tensin-converting enzyme, and biochemical 
and tissue-based pharmacological assays for 
the enzyme's function were used to guide an 
iterative design process to improve the po¬ 
tency, selectivity, and stability of small mole¬ 
cules inhibitors. The R1 and R2 substitutents 
were optimized, and the zinc ligand was 
changed to a thiol, which significantly in¬ 
creased potency (Table 10.2, compare 28a 
with 28c). This process yielded the orally 
available and stable small molecule captopril 
(28d) within 18 months of the creation of the 
model, 

The following quotation [from the original 
research report (81)on the design of captopril] 
predicted the great promise of SBDD: "The 
studies described above exemplify the great 
heuristic value of an active-site model in the 
design of inhibitors, even when such a model is 
a hypothetical one." 

2.4.2 HIV Protease. The aspartyl endopro- 
tease encoded by human immunodeficiency vi¬ 
rus (HIV-P) catalyzes essential events in the 


maturation of infective virus particles, the 
cleavage of polyprotein precursors to yield ac¬ 
tive products. After this was demonstrated in 
the mid to late 1980s, HIV-P became a target 
for the development of antiviral drugs to treat 
acquired immunodeficiency syndrome (AIDS). 
Several HIV-P inhibitors have been approved 
for human therapeutic use in the past 10 
years, and the speed with which they were de¬ 
veloped is attributed in part to the successful 
use of SBDD methods. There are excellent re¬ 
cent reviews of this area (88, 89). There are 
numerous reviews of the early work on HIV-P 
inhibitors (8, 9, 90, 91). 

HIV-P is a symmetrical homodimer of iden¬ 
tical 99 residue monomers, structurally and 
mechanistically similar to the pseudosymmet- 
ric pepsin family of proteases (92-94), whose 
members include renin. Because the protease 
is a minor component of the virion particle, 
intensive structural studies required overpro¬ 
duction through recombinant DNA methods. 
One of the first structures was determined 
with material synthesized nonbiologically 
(through peptide synthesis). As of June 2002, 
there were over 100 X-ray structures repre- 
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Table 10.2 Key Compounds in the Development of Captopril 


Compound 

Structure 


IC„ for inhibition of ACE (/xM) 

(28a) (succinyl-proline) 
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sented by coordinate sets in the Protein Data 
Bank, and many hundreds more have been de¬ 
termined in proprietary industrial studies. 

The active site of the enzyme is C2 symmet¬ 
ric in the absence of substrates or inhibitors 
(Fig. 10.9a), and contains two essential aspar¬ 
tic acid residues (Asp25 and Asp25'). The en¬ 
trance to the active site is partly occluded by 
"flaps" constructed of two beta strands (resi¬ 
dues 43-49 and 52-58) from each monomer, 
connected by a turn. In the absence of sub¬ 
strate or inhibitor, the flaps seem to be rather 
flexible. Upon binding of inhibitors and pre¬ 
sumably of substrates, the residues within the 
flaps undergo movements up to several ang¬ 
stroms to interact with the bound ligand (Fig. 
10.10). A single tightly bound water is ob¬ 
served in the structures of most HlV-P-inhib- 
itor complexes, accepting hydrogen bonds 
from the backbone amides of both flap resi¬ 
dues Ile50 and Ile50' and donating to carbon¬ 
yls of the bound inhibitors. This is referred to 
as the "flap" water. Despite the presence of 
this water and several tightly bound water 
molecules on the floor of the active site, the 
cavity also contains extensive hydrophobic 


surface area. The minor differences between 
the HIV proteases from two major strains cf 
HIV (HIV-1 and HIV-2) are not addressed 
here. More significant are the HIV-P sequence 
variants with much reduced sensitivity to ex¬ 
isting drugs that have evolved because of se¬ 
lective pressure and the rapid mutation rate cf 
the virus. The reader interested in the differ¬ 
ences between the proteases from HIV-1 and 
HIV-2, or in the issues surrounding drug-re¬ 
sistant variants, is referred to Ref. 91 and Ref. 
89, respectively. 

The early work on inhibition of HIV-P was 
much influenced by previous structural and 
mechanistic work on pepsin and its inhibitors. 
Both enzymes are thought to catalyze peptide 
hydrolysis through a tetrahedral transition 
state, shown below as (29).The previous work 
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on transition state mimics as pepsin inhibitors 
and the sequence of some cleavage sites for 
HIV-P led to the discovery at Roche of the R 
and S versions of (30) as submicromolar inhib¬ 
itors of HIV-P, with the R enantiomer being 
threefold more potent (95). These inhibitors 
employ a hydroxyethylamine moiety to re¬ 
place the Pl-Pl' linkage that is normally 
cleaved (the scissile bond) with a stable group. 
The lead molecules were optimized without 
knowledge of the HIV-P crystal structure, to 
prodiuce (31)(Ro 31-8959, saquinavir, Forto- 
vase). 


Figure 10.9. (a) Residues 
in the active site of H N pro¬ 
tease. The C2 axis that re¬ 
lates the residues of the two 
monomers is indicated. The 
carboxylates cf Asp25 and 
Asp25' are the catalytic 
groups. Not shown in this 
view are several flap resi¬ 
dues (Ile47/Ile47', Ile50/ 
Ile50'), which move in to in¬ 
teract with inhibitors, (b) 
Active site with bound (31) 
[saquinavir (PDB code 
1HXB)]. Note the asymme¬ 
try of inhibitor binding. The 
flap water that is shown 
very close to saquinavir is 
labeled W. See color insert. 



(30) 

Saquinavir (31)was the first HIV-P inhib¬ 
itor approved for human use. Figure 10.9B 
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Figure 10.10. Comparison cf the 
structures of HIV-P apoenzyme 
monomer (top, PDB code 3PHV) 
and the complex between HIV-P 
and (32) (U-85548; bottom, PDB 
code 8HVP). The inhibitor is 
shown as a ball and stick structure. 
Note the rearrangement of the flap 
residues; Ile50 is indicated for ref¬ 
erence. The van der Waals surface 
of Asp25 is shown in both struc¬ 
tures. The flap water (red ball) is 
also shown between IleSO and 
U-85548. In the bottom structure, 
the locations of the AT and C termini 
of HIV-P are noted. See color in¬ 
sert. 



shows the asymmetrical binding mode of the 
molecule in the HIV-P active site. Because the 
metabolic and pharmacokinetic characteris¬ 
tics of this compound and several other early 


O 



HIV-P inhibitor drugs are less than ideal, the 
search for better ones has continued. Many cf 
the deficits arise from the large size and pep- 
tidic nature of the inhibitors. Another early 



(31) saquinavir, Ro 31-8959 
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(32) 


inhibitor was the modified octapeptide (32, 
U-85548) developed at Upjohn (96). 

This subnanomolar inhibitor was used to 
define the extensive hydrophobic and hydro¬ 
gen bonding interactions available in the 
HTV-P active site (97). A common feature in 
the binding of (31)and (32) to HIV-P is the 
interaction of the central hydroxyl group of 
the inhibitors with the carboxylates of both 
Asp25 and Asp25'. This hydroxyl group re¬ 
places a water molecule that likely binds be¬ 
tween these aspartyl side chains during pep¬ 
tide hydrolysis by HIV-P. The inhibitors can 
therefore be seen as mimics of a "collected 
substrate." The liberation of this water to 
bulk solvent probably contributes about 5 kcal 
mol -1 to the free energy of inhibitor binding, 
based on the studies by Rich and his colleagues 
on similar inhibitors of pepsin (98, 99). An in¬ 
teresting difference between (31)and (32) is 
that (31) has R stereochemistry at the hy¬ 
droxymethyl center, whereas in (32) this is an 
S center. Part of the reason for this is that 
when (31) binds to HIV-P, the decahydro- 
quinoline ring system induces a conforma¬ 
tional change in the protein, affecting primar¬ 


ily site S,'. The optimal stereochemistry at the 
hydroxymethyl center appears to be which¬ 
ever one will allow the interaction of the hy¬ 
droxyl with both catalytic aspartates while ac¬ 
commodating the placement of inhibitor 
moieties in the S„ S„ S,', and S,' sites with 
minimal conformational strain on the inhibi¬ 
tor (9). 

Both (31)(Fig. 10.9b) and (32) (Fig. 10.11) 
bind to the HIV-P active site asymmetrically. 
However, after the X-ray studies of crystalline 
HIV-P apoenzyme revealed it to be a symmet¬ 
rical dimer, C2 symmetric inhibitors were de¬ 
signed to take advantage of this structural fea¬ 
ture (Fig. 10.12). Both alcohol diamines and 
diol diamines were examined. For example, 
the C2 symmetric compound (33) (A-77003) 
was synthesized at Abbott and entered clinical 
trials as an antiviral agent for intravenous 
treatment of AIDS (100). 

The X-ray structures of complexes between 
HIV-P and diol diamine derivatives like (33) 
showed (101) that, although one of the hy¬ 
droxyl groups bound between the catalytic as¬ 
partyl carboxylates and made contacts with 
both, the second hydroxyl made only one such 



(33) A-77003 
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Figure 10.11. Orthogonal views of 
the complex between HIV-P and (32) 
(U-85548). The view in panel a is ro¬ 
tated approximately 90" (around the 
long axis of the protein) from the 
view in panel b. Van der Waals sur¬ 
faces of Asp25 f Asp25', and the flap 
water (W) are shown. In panel b, the 
solvent-accessible surface of the in¬ 
hibitor is shown. See color insert. 
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Figure 10.12. Design principle for C2 symmetric inhibitors of HIV-P and the related hydroxyeth¬ 
ylene diamine scaffold. 
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contact. Thus the cost of desolvating the sec¬ 
ond inhibitor hydroxyl upon binding is not 
compensated by strongly favorable interac¬ 
tions in the complex (8). This led to the dele¬ 
tion of the second hydroxyl, as seen in com¬ 
pound (34), another compound in this 
program at Abbott. Further structural modi¬ 
fications, to enhance solubility and metabolic 
stability, were guided by the fact that the 
"ends" of the protease-bound inhibitors were 
relatively solvent exposed and made fewer 
contacts with the enzyme (102). Deletion of a 
valine residue (33^ 34) gave a smaller com¬ 
pound, presumably aiding solubility and ab¬ 
sorption. The eventual product of this pro¬ 
gram was ritonavir (35, A-84538, ABT-538, or 
Norvir), which has been successfully launched. 

Another C2 symmetric HIV-P inhibitor, 
discovered at Dupont Merck is compound (36) 
(DMP-450). This was one of a series of cyclic 
ureas designed to interact with both the aspar- 
tyl carboxylatesand the Ile50 and Ile50' back¬ 
bone amides that hydrogen bond with the flap 



water (103). The compounds interacted with 
HIV-P in a highly symmetrical fashion, as 
they had been designed to do, with the urea 
oxygen replacing the flap water. Compound 
(36) was licensed to Triangle Pharmaceuti¬ 
cals, and the mesylate advanced into Phase I 
clinical trials. Its future is uncertain after the 
trials were put on hold because of animal tox¬ 
icity (http ://w w w .tripharm.com/dmp45 O .html). 

One of problems common to many of the 
HIV-P inhibitors already discussed is their 



(35) ritonavir 
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(37) indinavir 


low solubility, which translates to low bio¬ 
availability. The discovery of (37) (indinavir, 
L-735,524) was the result of the successful ap¬ 
plication of SBDD at Merck to directly address 
this problem. During an iterative optimization 
process, the physicochemical properties of 
HIV-P inhibitors were modified within con¬ 
straints that were established structurally 
(104).Crixivan (the sulfate of 37) was success¬ 
fully launched for use as an antiviral drug. 

The process leading to indinavir (Fig. 
10.13) began with (38), a hydroxyethylene- 
containing heptapeptide mimic, originally de¬ 
signed as a renin inhibitor (105). The inhibi¬ 


tion of HIV-P by (38) was discovered by 
screening. Classical medicinal chemistry 
methods allowed a reduction in size, and the 
discovery of an amino-2-hydroxyindan moiety 
to replace the terminal dipeptide (correspond¬ 
ing to P 2 \ thought to bind into the S, site). 
This approach (105,106) resulted in the gen¬ 
eration of (39)(L-685,434). Although (39) had 
a subnanomolar IC 50 for inhibition of HIV-P, 
it also had very low aqueous solubility, like 
most peptidomimetics. One way to improve 
solubility is to insert a charged functional 
group into the molecule. The tertiary amino 
group in the HIV-P inhibitor saquinavir (31) 



(38) 

(boc= tert-butyloxycarbonyl) 



(39) 



(cbz= benzyloxycarbonyl) 


Figure 10.13. Structures of HIV-P protease inhibitors during the optimization process leading to 
the discovery of (37) (indinavir). 
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was already identified. Piracy of the decahy- 
droisoquinoline tert-butylamide from (31) 
provided the idea for the hybrid molecule (40). 
In addition to the charged group, use of this 
ring system would partly "preorder" the in¬ 
hibitor's structure, lessening the entropic cost 
of binding. Molecular modeling was used with 
known structures of HIV-P-inhibitor com¬ 
plexes to evaluate this idea, and it was judged 
to be reasonable enough to justify the synthe¬ 
sis of (40) (104). This compound was subse¬ 
quently shown to have much better pharma¬ 
cokinetic behavior than its antecedents, 
consistent with improved solubility and 
dissolution. 

A convergent synthetic route was devised 
to generate (40) to improve the accessibility of 
important analogs. Although (40) was an 8 n M 
inhibitor of the isolated enzyme, better po¬ 
tency was needed for acceptable cell-based ac¬ 
tivity, and still better solubility characteristics 
were needed. A method for structure-based 
computational estimation of the interaction 
energy for HIV protease inhibitors with the 
enzyme was developed and used to help esti¬ 
mate inhibitor potency before synthesis (107). 
Variation of the group contributing the ter¬ 
tiary amine led to the discovery of the pipera¬ 
zine derivative (41) (L-732,747), which had 
subnanomolar potency against HIV-P. The X- 
ray structure of the HIV-P complex with (41) 
confirmed the binding mode predicted by mo¬ 
lecular modeling, with the molecule filling the 
S„ S„ S,', and S 2 ' pockets, and the S 3 pocket 
occupied by the terminal benzyloxycarbonyl 
moiety. Replacement of the benzyloxycar- 
bonyl with more polar heterocycles, chosen to 


be accommodated by the S 3 pocket and to 
further improve aqueous solubility, yielded (37). 

Several other approved AIDS drugs that 
act by inhibition of HIV-P have also been de¬ 
veloped through use of SBDD methods. Com¬ 
pound (42) (amprenavir, Agenerase, also 
known as VX-478) is the most recent addition 
to the HIV-P inhibitors approved for human 
antiviral treatment, and differs significantly 
from earlier inhibitors. Compound (42) was 
specifically designed by Vertex scientists to 
minimize molecular weight to increase oral 


(43) nelfinavir 

bioavailability (108). Compound (43) (nelfina- , 
vir, AG-1343, also known as LY312857), like 
the precursors to the earlier drug (37) (indina¬ 
vir), copied the decahydroisoqui nol i nefe/t-bu- 
tylamide group from the first marketed HIV-P 
inhibitor (31) (saquinavir). Compound (43) 
was developed in a collaboration between sci¬ 
entists at Lilly and Agouron (109), and is mar- 




(42) amprenavir 
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keted by Pfizer as Viracept, the mesylate salt 
of nelfinavir. In both (42) and (43), the scien¬ 
tists involved used iterative SBDD methods to 
alter the physicochemical properties of the 
drug molecule while maintaining potency by 
optimizing interactions with the active site of 
the enzyme. An important feature shared by 
these compounds is the fact that the bound 
inhibitors appear to be in low energy conform- 
ers, so that minimal conformational energy 
costs must be paid upon binding to the en¬ 
zyme. 

2.4.3 Thrombin. Thromboembolic diseases 
such as stroke and heart attack are major 
health problems, especially in many Western 
countries. This has led to searches for drugs 
that are effective inhibitors of various serine 
endoproteases in the blood-clotting cascade, 
such as factor Xa and thrombin. Existing ther¬ 
apeutic agents such as the coumarins (like 
warfarin), heparin, and hirudin have prob¬ 
lems related to their absorption or unpredict¬ 
able metabolism and clearance. Recently, new 
small molecule inhibitors of thrombin have 
become available for human use in the United 
States, including (44) (argatroban, MD-805, 
developed by Mitsubishi) and (45) (melagat- 
ran, developed by AstraZeneca) (110, 111). 
These nanomolar inhibitors of human throm¬ 
bin were optimized by classical medicinal 
chemistry, starting with peptidomimeticssim¬ 
ilar to the thrombin cleavage site in fibrinogen 
(see Fig. 10.14a). Poor absorption by an oral 
route requires that they must be administered 
intravenously or at best subcutaneously. At 
present, the only direct inhibitor of thrombin 
suitable for oral administration is ximelagat- 
ran, a prodrug form of melagatran in late de¬ 
velopment for various cardiovascular indica¬ 
tions by AstraZeneca as of mid-2002. The 
therapeutic need and the availability of high 
quality crystal structures for human throm¬ 
bin bound to inhibitors such as (44) make this 
an attractive target for SBDD (112). The sig¬ 
nificant efforts at Merck to use SBDD ap¬ 
proaches to develop orally available inhibitors 
of thrombin, which have yielded compounds 
that have entered clinical trials, have been re¬ 
viewed (113,114). For a good overview of this 
area, see the review by Babine and Bender (9). 

Compound (46) [NAPAP, N-alpha-(2- 


naphthylsulfonylglycyl)-4-amidinophenylala- 

nine piperidide] is a moderately potent inhib¬ 
itor of human thrombin, but was found to 
have an unacceptably short plasma half-life in 
animals (115). However, (46) has been a use¬ 
ful experimental tool and a variety of analogs 
have been made. The structures of (44) and 
(46) bound to human thrombin show that they 
bind somewhat differently, as shown in Figure 
10.14b (112,116). However, both form hydro¬ 
gen bonds with the backbone at Gly216 (part 
of the oxyanion hole), and both fill the S x spec¬ 
ificity pocket with a permanent cation at¬ 
tached to an extended hydrophobic group. 
Compound (46) was the starting point at 
Boehringer Ingelheim for the development of 
the orally bioavailable prodrug (47) (BIBR- 
1048) that generates in vivo a potent inhibitor 
of human thrombin (117). Compound (47) is 
currently in human clinical trials. 

Scientists at Boehringer Ingelheim used 
the crystal structure of the complex between 
(46) and human thrombin to design a replace¬ 
ment for the central bridging glycine moiety. 
The hypothesis that a trisubstituted benz¬ 
imidazole could correctly place groups into the 
S-l, S„ and S 4 pockets was confirmed. The first 
such compound made was (48). The IC„ for 
thrombin inhibition by (48) was only 1.5 fiM } 
but the compound had an improved serum 
half-life in rats. Determination of the crystal 
structure of the thrombin- (48) complex 
showed that (48) binds in a similar fashion to 
(46). The N-methyl on the benzimidazole fit 
into the P x pocket, and the phenylsulfonyl 
group extended into S 4 . The low affinity is 
likely attributable to the fact that (48) forms 
no hydrogen bonds with the backbone of 
Gly216. An iterative optimization process 
(Fig. 10.15) was used to regain the lost affinity, 
eventually surpassing the thrombin affinity of 
the starting point (46) (0.2 fxM). 

Surprisingly, the N-methyl group could not 
be replaced with larger alkyl substituents, de¬ 
spite what appeared to be room for them in the 
P x pocket. However, replacing phenyl with 
larger aryl groups such as naphthyl or quino- 
linyl on the sulfonamide provided favorable 
interactions in the P 4 pocket. The crystal 
structure suggested that the increased li- 
pophilicity of such aryl groups could be bal¬ 
anced by appending charged substituents to 
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(b) 


S 2 


Figure 10.14. (b) Schematic 
comparison of the binding in¬ 
teractions for (44) and (46) in 
X-ray structural models of 
crystalline thrombin. 



2.4.4 Caspase-1. Caspase-1 (interleukin 
1-/3 converting enzyme, or ICE) is a member of 
a family of cysteine proteases that catalyze the 
cleavage of key signaling proteins in such pro¬ 
cesses as inflammatory response and apopto¬ 
sis. Genetic methods have provided evidence 
supporting a role for caspase-1 in diseases 
such as stroke (118) and inflammatory bowel 
disease (119). The X-ray structure of crystal¬ 
line human caspase-1 was solved in 1994 by 
several groups (120,121), and has been a valu¬ 
able tool in intensive efforts to design potent 
and bioavailable inhibitors of the enzyme. 

Compound (52) (pralnacasan, VX-740) was 


developed as a caspase-1 inhibitory therapeu¬ 
tic agent through use of SBDD in a collabora¬ 
tion between Vertex and Aventis. Although 
the details of the discovery process have not 
been published, (52) probably functions as a 
prodrug. The cleavage of the lactone of (52) 
would yield a hemiacetal that could hydrolyze 
to release ethanol and the aldehyde form cf 
the drug, which then can form a covalent thio- 
acetal with the active site thiol of caspase-1, 
leading to pseudoirreversible inhibition. Clin¬ 
ical trials of compound (52) as an anti-inflam¬ 
matory agent for treatment of rheumatoid ar¬ 
thritis began in 1999 (122).In April 2002, the 
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(52) pralnacasan 



(53) prinomastat, AG3340 



(54) CGS 27023 


compounds each have affinities in the nano¬ 
molar to picomolar range for several MMPs. 
The inhibitory profiles and ongoing clinical 
trials of a variety of drug candidates that in¬ 
hibit MMPs were reviewed in 2000 (124). 

Compound (53)was developed at Agouron 
through use of SBDD (125) and is under clin¬ 
ical investigation by Pfizer as an anticancer 
drug and as a treatment of proliferative reti¬ 
nopathy. Compound (54) is a stromelysin in¬ 
hibitor discovered at Novartis (126), without 
explicit structural guidance. However, the 
lead molecule from which (54) was developed 
was originally obtained by X-ray structure- 



(55) tanomastat. Bay 12-9566 


based inhibitor design targeted against the 
bacterial zinc-protease thermolysin. Com¬ 
pound (55), with particularly high affinity for 
the gelatinases, was also developed with con¬ 
sideration of the structures of other MMP- 
inhibitor complexes, but not through use of 
iterative SBDD (127). The clinical trials of 
compounds (54) and (55) have been sus¬ 
pended because of their disappointing efficacy 
(124). It remains somewhat uncertain which 
MMP is responsible for specific diseases, and 
the possibility for biological redundancy sug¬ 
gests that inhibition of several MMPs may be 
required for treatment of some diseases. 
SBDD clearly could have a major impact on 
the discovery of selective MMP inhibitors. 
These could be useful tools in dissecting the 
disease relevancy of these targets, as well as 
providing the selectivity and bioavailability 
required of effective drugs. 

2.5 Oxidoreductases 

Oxidoreductases catalyze the oxidation or re¬ 
duction of carbon-carbon, carbon-oxygen, or 
carbon-nitrogen bonds. Frequently, nicotin¬ 
amide cofactors are involved, with the oxi¬ 
dized and reduced forms (respectively, NAD + 
or NADP + and NADH or NADPH) receiving 
or donating the equivalent of a hydride during 
this process. Nicotinamide-linked oxidoreduc¬ 
tases that have been targeted for the discovery 
of new therapeutic agents include aromatase, 
dihydrofolate reductase (mentioned above), 
aldose reductase, and inosine monophosphate 
dehydrogenase. SBDD methods have been 
successfully applied recently to the latter two 
enzymes to discover agents that are currently 
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in human testing. The efforts with these two 
targets are described briefly below. 

2.5.1 Inosine Monophosphate Dehydroge¬ 
nase. Proliferative cells such as lymphocytes 
have high demands for the rapid supply of nu¬ 
cleotides to support DNA and RNA synthesis, 
as do viruses during their proliferative phase. 
The first dedicated step in the de novo biosyn¬ 
thesis of guanine nucleotides is conversion of 
inosinate to XMP, catalyzed by inosine mono¬ 
phosphate dehydrogenase (IMPDH). 

IMP + NAD + XMP + NADH 

A prodrug form of (56) (mycophenolicacid), a 
noncompetitive inhibitor of IMPDH, is ap¬ 
proved for human therapeutic use as an im- 



OH 


munosuppressant (mycophenolate mofetil, 
CellCept). The use of this drug is hampered by 
gastrointestinal side effects probably related 
to the metabolism of the drug. A second class 
of IMPDH inhibitors is represented by the nu¬ 
cleoside analog mizoribine (also known as bre- 
dinin), a prodrug approved for human use in 
Japan. Such compounds competitively inhibit 
IMPDH in vivo after phosphorylation (128). 
These drugs validate the strategy of targeting 
IMPDH for the discovery of immunosuppres¬ 


sants. Other utilities that have been suggested 
for IMPDH inhibitors are antiviral and anti¬ 
cancer therapies. 

The structure of hamster IMPDH in com¬ 
plex with IMP and (56)was solved at Vertex in 
the mid-1990s (129). This allowed the visual¬ 
ization of a covalent intermediate, in which a 
cysteine thiol from the enzyme adds to C2 of 
the purine ring of the nucleotide substrate. An 
analogous covalent adduct is postulated to be a 
key catalytic intermediate during normal 
turnover (130). The structure was a key tool in 
the discovery of (57)(VX-497, merimepodip),a 
novel potent inhibitor of human IMPDH suit¬ 
able for oral administration (131). 

An experimental screen of a diverse library 
of commercially available compounds for in¬ 
hibitors of IMPDH identified molecules with 
the phenyl, phenyloxazole urea scaffold (58) 
as weak inhibitors. Through use of the compu¬ 



tational program DOCK (132), the initial in¬ 
hibitors were built as models into the experi¬ 
mental structure of the crystalline complex of 
IMPDH, IMP, and (56). Structural analogs 
were generated to improve potency in an iter¬ 
ative process, guided by the structural model¬ 
ing and the observed changes in potency for 
inhibition of human IMPDH. 

After this process yielded compound (59), 
with nanomolar potency, an X-ray structure 



(57) merimepodip 
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(59) 


was determined of (59) bound to the hamster 
enzyme with IMP. This revealed both similar¬ 
ities and differences between the binding 
modes of (56) and (59). Aryl groups of both 
compounds pack against the covalently teth¬ 
ered purine of the nucleotide. Several hydro¬ 
gen bonding and hydrophobic interactions 
with the enzyme are also common between the 
two inhibitors. However, there are several hy¬ 
drophobic and van der Waals interactions seen 
in the complex with (59) that are not present 
with (56).Importantly, the urea moiety of (59) 
forms a network of hydrogen bonds with an 
aspartyl carboxylate that is not present in the 
complex with (56).Further modification of the 
structure was guided by the X-ray study by use 
of (59), to gain potency in a cell-based assay for 
inhibition of lymphocyte proliferation. This 
provided compound (57), which Vertex has ad¬ 
vanced into clinical trials for treatment of hep¬ 
atitis C infections. 

2.5.2 Aldose Reductase. Aldose reductase 
has been implicated in many of the pathologies 
resulting from elevated tissue levels of glucose 
in diabetes mellitus (133, 134). This nicotin¬ 
amide-dependent enzyme catalyzes the con¬ 
version of glucose to sorbitol, accumulation of 
which ultimately results in damage to the 
eyes, the nervous system, and the kidneys. 
Given the enormous damage caused by this 
disease and the difficulty in regulating blood 
glucose, selective and potent inhibitors of hu¬ 
man aldose reductase offer great potential 
benefit. However, existing drugs that target 
aldose reductase have unreliable efficacy 
(135). For example, compound (60) (tolrestat) 
was withdrawn by Wyeth in 1996 because of 
poor clinical response. Hence, there is still a 
need to provide an inhibitor of this enzyme 
that fulfills the potential in the clinic. To min¬ 
imize the risk of undesired toxicities, clinical 


0 
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(60) tolrestat 

agents that target aldose reductase should not 
inhibit the closely related aldehyde reductase, 
an essential hepatic enzyme. 

The structure of (60) and other inhibitors 
bound to porcine aldose reductase (136) pro¬ 
vided a rich lode of information on the require¬ 
ments for potent and selective inhibition of 
aldose reductase. This was mined by scientists 
at the Institute for Diabetes Discovery, in a 
project that began in 1996. The Institute for 
Diabetes Discovery filed an IND application 
for (61) (lidorestat, IDD 676), a potent aldose 



reductase inhibitor, for treatment of diabetic 
complications, within 30 months of initiating 
the discovery project on this target. The speed 
with which this was achieved appears in large 
part because of the use of SBDD methods. 

The X-ray structures showed the cofactor 
NADP + buried within the enzyme, with its C4 
redox center exposed at the bottom of a deep 
hydrophobic cleft. An anionic binding site is 
located near NADP + . Several potent inhibi- 
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tors bind within the hydrophobic cleft and in¬ 
teract with the anionic site. The binding of 
potent inhibitors induces a conformational 
change, opening an adjacent hydrophobic 
pocket. The conformation induced by (60) dif¬ 
fers from that caused by other, less selective 
inhibitors. This "specificity" pocket was 
thought to offer an opportunity for selective 
inhibition of aldose reductase while sparing 
aldehyde reductase. Hence, this structural 
study provided an initial pharmacophore for 
both potency and selectivity. 

The SAR for this pharmacophore was de¬ 
veloped with a series of synthetically accessi¬ 
ble salicylic acid derivatives that were scored 
for potency and selectivity with the purified 
enzymes, and efficacy in a diabetic rat model 
(137). One of the most potent and selective of 
the derivatives was (62), containing the benz- 


O 



(62) 

thiazole heterocycle. The SAR was employed, 
guided by the structures of selected inhibitor 
complexes, to design a novel indole scaffold to 
present the pharmacophoric elements (M. Van 
Zandt, personal communication). The optimi¬ 
zation of this series provided the clinical can¬ 
didate (61) (138). 

2.6 Hydrolases 

Some other hydrolytic enzymes, in addition to 
proteases, that are important drug targets in¬ 
clude protein phophatases, phosphodiester¬ 
ases, nucleoside hydrolases, acetylhydolases, 
glycosylases, and phospholipases. Structure- 
based inhibitor design is currently being ap¬ 
plied to a number of these enzymes. The last 
three mentioned have been successfully tar¬ 


geted in SBDD projects that have produced 
compounds that are either launched or in clin¬ 
ical trials. 

2.6.1 Acetylcholinesterase. A pronounced 
decrease in the level of the neurotransmitter 
acetylcholine is one of the most pronounced 
changes in brain chemistry observed in the 
sufferers of Alzheimer's disease (139). Several 
drugs that are approved for the treatment of 
the dementia thought to result from this neu¬ 
rotransmitter deficit act by inhibiting acetyl¬ 
cholinesterase. These include (63) (tacrine, or 




Cognex, a Pfizer drug that was the first such 
agent approved for this indication), (64) (don- 
ezepil), and (65)(rivastigmine). Several other 
agents are in clinical trials. Disappointing ef- 



(65) rivastigmine 


ficacy is observed with the existing drugs, aris¬ 
ing from dose limitations that are likely attrib¬ 
utable to the inhibition of acetylcholinesterase 
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in peripheral tissues (140). This may be a con¬ 
sequence of the high serum levels required to 
get these highly cationic molecules to pene¬ 
trate the blood-brain barrier. 

In a discovery project that is reminiscent of 
the discovery of captopril, scientists at Takeda 
created a hypothetical structure for the active 
site of acetylcholinesterase, based on SAR 
from previous biochemical and medicinal 
chemical work (141). The model consisted of 
(in addition to the serine protease-like cata¬ 
lytic machinery) an anionic binding site sepa¬ 
rating two discrete hydrophobic binding sites. 
This model was then used to design inhibitors 
of the enzyme (reviewedin ref. 142). One set of 
analogs examined were based on the N-(a>- 
phthalimidy lalky 1) -N-( co-pheny lalkyl) - amine 
(scaffold 66). An iterative process of testing, 


ing site. The length of both alkyl linkers was 
varied, and the effect of adding a third alkyl 
substituent was examined. The phthalimide 
portion of the structure was chosen to improve 
the synthetic accessibility of the analogs 
needed for this exercise. The compounds were 
tested not only for inhibitory potency toward 
rat cerebral acetylcholinesterase, but also for 
peripheral response and toxicity in dosed in¬ 
tact rats. After the work was under way, Suss- 
man and coworkers solved the atomic struc¬ 
ture of acetylcholinesterase from the electric 
eel, including complexes with several inhibi¬ 
tors, by X-ray crystallography (143). The 
availability of this structure made it possible 
to retrospectively analyze the basis for the 
SAR in this series of compounds, by use of 
DOCK (144). 




( 66 ) 


analysis, design, and synthesis, by use of this 
and closely related scaffolds, resulted in the 
production of (67) (TAK-147), which is cur¬ 



rently in clinical trials for treatment of the 
dementia resulting from Alzheimer's disease 
(142). 

The design of (66) was partially based on 
the structures of previously known inhibitors. 
The two aryl substituents were intended to 
bind to the hydrophobic binding sites, placing 
the central amine cation into the anionic bind- 


2.6.2 Neuraminidase. Influenza virus in¬ 
fections cause severe human suffering 
throughout the world and economic damage in 
the billions of dollars annually, although some 
years are worse than others. In 1918 a pan¬ 
demic caused by this disease killed an esti¬ 
mated 40 million people (145). An important 
protein in the infectious process is the viral 
neuraminidase, an integral membrane protein 
whose catalytic domain is exposed on the viral 
surface. Neuraminidase catalyzes the hydrp- 
lytic cleavage of sialic acid (68, AT-acetylneur- 
aminic acid) from glycoproteins and extracel¬ 
lular mucin on the surface of the host cell. A 
different viral surface protein tightly binds to 
terminal sialic acid residues, which promotes 
the initial infection, but prevents release of 
viral progeny from the host cells, unless and 
until the terminal sialic acids are hydrolyti¬ 
cally cleaved by viral neuraminidase. Thus, 
neuraminidase enables the infection to 
propagate. 

The first X-ray structure of influenza neur¬ 
aminidase was determined in the early 1980s 
(146). Ten years later, a landmark paper (147) 
described a highly efficient drug design project 
at Monash University in Australia. This 
project yielded antiviral compound (69)(zana- 
mivir, Relenza, or Flunet), which was devel¬ 
oped into one of the first drugs to be created 
through use of SBDD. Previous structural 
work had revealed that the active site of neur¬ 
aminidase has several rigid pockets and nu- 
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(69) zanamivir 

merous charged groups. Electrostatic interac¬ 
tions significantly affect the conformation of 
bound sialic acid, which is deformed into a 
high energy conformer, attributed in part to 
the interactions between the 1-carboxylate 
and arginine side-chains of the protein. This 
deformation may play a key role in catalysis. 
Synthesis of a sialic acid analog that is dehy¬ 
drated across the C2-C6 bond of ( 68 ) had pro¬ 
vided the putative transition state mimic ( 70 ) 
(sometimes referred to as Neu5Ac2en, or 
as 2-deoxy-2,3-dehydro-A r -acetylneuraminic 
acid, DANA). 

Compound (70) inhibits neuraminidase 
with micromolar potency (148). Examination 
cf the binding mode of (70) in the active site of 
neuraminidase (Fig. 10.16) led to the replace¬ 
ment of the 4-hydroxyl by cationic groups, 
first an amino and then a guanidino group 
(147).These groups strongly interact with an¬ 
ionic amino acid side chains (corresponding to 
Glul20 and Glu229 shown in Fig. 10.16) in the 
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(68) sialic acid 



(70) Neu5Ac2en, DANA 


neuraminidase active site. In the case of the 
guanidine substitution, the binding affinity 
for neuraminidase was increased about 5000- 
fold and provided ( 69 ), which inhibits viral re¬ 
lease in cell cultures and decreases the sever¬ 
ity of influenza virus infections in humans. 
Subsequently, the X-ray structures of neur¬ 
aminidases from several different influenza 
subtypes complexed with (69) were analyzed 
(149). Although the positions of protein resi¬ 
dues were well conserved, the water structure 
seen in these different complexes was quite 
variable. This may explain the varying po¬ 
tency of (69) against different strains of virus. 

One problem with (69) is that it is not well 
absorbed by an oral route, and so must be ad¬ 
ministered as an aerosolized powder inhaled 
into the virus-infected lungs. Two other neur¬ 
aminidase inhibitors with nanomolar affini¬ 
ties (71 and 72) have been developed through 
the use of SBDD methods to yield orally bio- 
available drugs. The development of these 
agents was facilitated by the fortuitous discov- 
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Figure 10.16. View from above: Polar amino acid 
side-chains surrounding (70), bound into the active 
site of influenza virus neuraminidase (Scheme 10.1 
based on PDB code 1NNB, the coordinates of an 
X-ray structure described in Ref. 148). 



ery by scientists at Biocryst, that analogs of 
(69) in which the cyclic scaffold is a phenyl 
moiety are much more potent inhibitors if 
they lack the glycerol side chain! This was 
subsequently discovered by X-ray structural 
studies to be attributed to the creation of an 
unanticipated hydrophobic pocket upon rear¬ 
rangement of the Glu278 side chain carboxylic 
acid, which forms several hydrogen bonds 
with the glycerol portion of (69) (Fig. 10.16). 

Replacement of the permanently cationic 
guanidine by an amine (71) promoted better 
intestinal absorption, but also greatly de¬ 
creased the affinity for neuraminidase. Struc¬ 
ture-guided modification of the carbocycle's 
substituents was used to recover this lost po¬ 
tency. Compound (71) (GS 4071) was devel¬ 
oped by Gilead Sciences (150).The ethyl ester 
of (71) is a prodrug (oseltamivir or GS 4104) 
that has been approved for oral dosing to treat 
influenza infection. Another amphiphilic car- 
bocycle, compound (72) (peramivir, RWJ- 
270201, or BCX 1812) was developed by Bio- 
Cryst (151) through use of SBDD, and is in 
clinical trials. The use of clever synthetic 
routes, biochemical assays for neuraminidase 
inhibition, a mouse infection model, and X-ray 
structural information were all valuable tools 
in the development of both (71) and (72). Op¬ 
timization of the affinity required the exami¬ 
nation of avariety of alkyl substituents in bdth 
cases, to exploit the new hydrophobic pocket 
created by the conformational change primar¬ 
ily involving Glu278. The ability of the cyclo¬ 
pentyl ring in (72) to replace the six-mem- 
bered ring illustrates that differing central 
scaffolds can display the essential interacting 
groups in an effective way. 



2.6.3 Phospholipase A2 (Nonpancreatic, Se¬ 
cretory). Phospholipases A2 (PLA2s) are a di¬ 
verse family of hydrolases that cleave the sn-2 
ester bond of phospholipids. The fatty acid 
produced is frequently arachidonate, the pre¬ 
cursor to the proinflammatory eicosanoids. In 
several human inflammatory pathologies 
(e.g., septic shock, rheumatoid arthritis), a 
nonpancreatic secretory form of PLA2 (hnps- 
PLA2) is present in extracellular fluids at lev¬ 
els many-fold higher than normal (152). The 
design of bioavailable inhibitors of this Ca 2+ - 
dependent isoform of PLA2 as inflammatory 



2 Structure-Based Drug Design 


453 


drugs is therefore an attractive goal (153). To 
be an effective drug, such an inhibitor would 
also need to be selective for hnps-PLA2 vs. the 
closely related pancreatic PLA2. Whether se¬ 
lectivity is needed against the quite different 
cytosolic PLA2 is unclear. 

Investigators at an AstraZeneca laboratory 
(previously Fisons) have used multidimen¬ 
sional NMR and computational techniques to 
develop an active site model for cytosolic PLA2 
(154,155). Synthesis of compounds based on 
this model led to (73) (FPL-67047), reported 



(73) FPL-67047 

to be a development candidate for treatment 
cf inflammation (156). 

Investigators at Eli Lilly began a project to 
develop PLA2 inhibitors by investing the ef¬ 
fort to clone, overproduce, purify, crystallize, 
and determine the structure of hnps-PLA2 
(157). This also provided the reagent needed 
for a massive screening campaign to identify 
hnps-PLA2 inhibitors. They were thus pre¬ 
pared to apply SBDD methods when the 
screening of Lilly’s small molecule collection 
yielded a weak inhibitor. The hit (74) was sur- 



(74) 


prisingly similar to indomethacin (75), a non¬ 
steroidal anti-inflammatory drug that acts by 
inhibiting cyclooxygenase. 



(75) indomethacin 


The crystal structures of recombinant 
hnps-PLA2 bound to (74) and (75) were solved 
(158), and compared with the previously 
known structures of PLA2s complexed with 
substrate mimics (159, 160), including the 
phosphonate-containing transition state ana¬ 
log (76). The earlier structures revealed sev- 



(76) hnps-PLA2 transition-state analog 


eral key features. These were: (l)the filling of 
a significant hydrophobic crevice, (2) the dis¬ 
placement (by the sn-2 alkyl moiety) of the 
His6 side-chain into a solvent-exposed posi¬ 
tion to create an adjacent cavity, (3)the coor¬ 
dination of the active site calcium, and (4)for¬ 
mation of hydrogen bonds to His48 and Lys69. 
The polar contacts were provided by the non¬ 
bridging phosphate and phosphonate oxygens 
in the complex with (76). 

The screening hit (74) bound in the hydro- 
phobic crevice, similarly to the substrate mim¬ 
ics, with the 1-benzyl moiety of (74) bound in 
the adjacent cavity and displacing the en¬ 
zyme’s His6 imidazole. However, there were 
two surprising findings. First, despite the 
presence of 10 m M calcium in the crystalliza¬ 
tion liquor, there was no bound calcium, an 
essential active-site component, although 
weakly binding ( K d = 1.5 mM). Second, the 
carboxylic acid of (74) formed a hydrogen 
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bond with another active-site acid, the side 
chain of Asp49. The latter finding again em¬ 
phasizes the importance of experimental 
structures to guide improvements of inhibitor 
potency, given that placing two presumed an¬ 
ions so close together would likely never have 
been predicted by a computational model. 
Other slight conformational changes were ob¬ 
served to accommodate the 5-methoxy group 
of (74). 

The inhibitor's 3-acetate moiety was con¬ 
verted to an acetamide in a successful attempt 
to restore the active site calcium, form a hy¬ 
drogen bond to His48, and increase potency. 
The crystal structure of the complex with the 
amide version of (74) also revealed a signifi¬ 
cant reorientation of the indole core and 5-me- 
thoxy substituent, resulting in an unantici¬ 
pated 5-A movement of the terminal methyl. 
Further changes in inhibitor structure were 
guided by iterative structural studies and 
functional assays of potency and selectivity. 
These changes involved the use of substitu¬ 
ents at positions 3 or 4 to optimize coordina¬ 
tion of the metal ion, extension of the van der 
Waals interaction by lengthening the 

2- methyl to an ethyl, and conversion of the 

3- acetamide to glyoxamide (159,161).This re¬ 
sulted in the synthesis of (77) (compound 



(77) LY315920 


LY315920), which has 6500-fold greater affin¬ 
ity for hnps-PLA2 than did the original hit 
molecule (74). LY315920 effectively inhibits 
hnps-PLA2 in the serum of transgenic mice 
dosed with the compound orally or i.v., and is 
undergoing clinical trials in the United States 
and Japan (162,163). 

2.7 Picornavirus Uncoating 

Picornaviruses, which include the rhinovi- 
ruses and enteroviruses, are RNA viruses that 
cause several infectious human diseases. 
These diseases include common colds as well 
as life-threatening infections of the respira¬ 
tory and central nervous systems. Effective 
treatments of these diseases would relieve 
much human suffering, save many lives, and 
have great economic benefit. There are over 
100 serotypes of rhino viruses alone, making it 
impossible to generate a vaccine effective 
against infections by all variants of the virus 
(164). 

The Achilles heel of picornaviruses has 
been suggested to be that part of the virus 
structure that interacts with the cell surface 
receptor because those structural features 
must be well conserved (165). The virus parti¬ 
cle consists of a positive-strand RNA coated by 
an icosahedral shell, containing 60 copies cf 
four distinct j3-barrel proteins (166). These 
structural proteins contain the binding site 
for the cellular receptor and undergo signifi¬ 
cant conformational changes to liberate the 
viral RNA genome during infection of the cell. 
A series of isoxazoles that inhibit this picoma- 
virus "uncoating" process were discovered in 
the early 1980s by scientists at Sterling 
Winthrop, by use of an in vitro cellular assay 
for antiviral activity (167-170). One of these, 
compound (78) (WIN-51711, disoxaril),gave a 
50% suppression of viral plaque formation in 
this assay at 0.3 giVf. Compound (78) was also 
effective in animal models (171) and entered 
phase I clinical trials, but failed to advance 
because of its toxicity. Compound (78) was 
shown (172) to bind to viral capsid protein 



(78) WIN-51711, disoxaril 
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Figure 10.17. Structure cf 
rhino virus capsid protein VP1 
showing the bound conforma¬ 
tion of antiviral isoxazole com¬ 
pounds (78) [disoxaril, WIN- 
51711: panel a, top], (79) [WIN- 
54954: panel b, middle], and 
(80) [pleconaril, WIN-63843: 
panel c, bottom]. The PDB 
codes for the X-ray structural 
model coordinates used to cre¬ 
ate these views are: 1PIV (for 
78), 2HWE (for 79), and 1C8M 
(for 80). On the left side of each 
panel, the inhibitors are shown 
as van der Waals surfaces, and 
the protein as a ribbon diagram. 
On the right side, the struc¬ 
tures of the inhibitor alone are 
shown, from the same view, as 
ball and stick representations. 
See color insert. 


VP1, within a hydrophobic pocket in the floor 
of the “canyon" that contains the binding site 
for the cell surface receptor (Fig. 10.17A). 
Structural changes induced in the canyon 
floor upon binding of such molecules may also 
inhibit receptor binding directly (173). X-ray 
crystallographic studies of (78) and analogs 
bound to the target protein VP1 were an es¬ 
sential jpart of the iterative optimization pro¬ 
cess that led to safer and more effective anti¬ 
viral agents (174-176). The goal of the process 
was to generate a compound that is potent, 
chemically and metabolically stable, and effec¬ 
tive against as many serotypes of the virus as 
possible. There was therefore a need to bal¬ 


ance potency and selectivity, and the struc¬ 
tural information helped to guide compound 
design in pursuit of this balance. 

A second-generation compound, (79) 
(WIN-54954)also advanced into clinical tests, 
but had disappointing efficacy in Phase II tri¬ 
als, probably because of extensive metabolism. 
Modification of the phenylisoxazole, guided by 
both structural and metabolic considerations 
(177), allowed the creation of a stable and po¬ 
tent antiviral, the third-generation compound 
(80)(WIN-63843,pleconaril, or Picovir) (178). 
This compound was evaluated in Phase III 
clinical trials and showed efficacy in humans. 
Oral dosing of virally infected patients with 



(79) WIN-54954 
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(80) three times daily decreased the average 
time needed to become free of cold symptoms 
from 10 days to between 8 and 9 days, and also 
reduced the duration of severe cold symptoms 
from 4.5 to 3.5 days (179). During the clinical 
studies to support the new drug application 
for (80), about a quarter of the clinical isolates 
(of rhinovirus present initially or during the 
treatment) were resistant to the compound. 
The majority of these resistant viruses had a 
single mutation at VP1 residue Ile98, which 
directly interacts with (80) bound to VP1 in 
wild-type vims. The clinical data also showed 
the elevation in some patients of hepatic cyto¬ 
chrome P450 levels during treatment with 
(80), raising concerns about potentially haz¬ 
ardous drug-dmg interactions. ViroPharma 
sought and failed in early 2002 to gain the 
approval of the U.S. Food and Drug Adminis¬ 
tration for its new drug application for (80Xor 
treatment of the common cold. 

2.8 Phosphoryl Transferases 

Protein kinases and phosphatases play vital 
roles in intracellular signaling pathways and 
in the integration and control of major cellular 
processes. Kinases and other phosphoryl 
group transferases are essential in the metab¬ 
olism of lipids, nucleotides, and other small 
biomolecules. The use of SBDD methods on 
such targets has expanded as more of their 
X-ray structures have been solved, and will 
continue to grow as more targets are validated 
for their involvement in human diseases. 

2.8.1 Mitogen-Activated Protein Kinase p38o. 

Mitogen-activated protein kinase (MAPK) 
p38a is a member of a family of Ser/Thr-spe- 
cific protein kinases that are activated upon 
exposure of cells to mitogens such as bacterial 
lipopolysaccharide or environmental stresses 
such as exposure to UV irradiation or chemical 


oxidants. MAPK p38a has a central role in 
integrating the inputs from a complex signal¬ 
ing network. Activation of MAPK p38a re¬ 
quires the dual phosphorylation of conserved 
threonine and tyrosine residues on a loop near 
the enzyme's active site (180). The unacti¬ 
vated (nonphosphorylated) enzyme has a very 
low affinity for ATP, but can bind to pyridinyl- 
imidazole inhibitors (181,182). The activated 
enzyme in turn phosphorylates numerous 
substrates, including several transcription 
factors. This leads to activation of the tran¬ 
scription of many genes and causes the release 
of proinflammatory cytokines, primarily in- 
terleukin-lj8 (IL-lj3) and tumor necrosis fac¬ 
tor (TNFaf). MAPK p38a was identified as a 
central player in this inflammatory pathway 
in a key study by scientists at SmithKline 
Beecham (183). The study involved the molec¬ 
ular cloning of the genes encoding proteins 
that bind to anti-inflammatory pyridinyl-imi- 
dazole compounds already known to block the 
biosynthesis of IL-lj3 and TNFo. The binding 
proteins turned out to be members of a known 
kinase family. Since this finding, the enzymes 
in the MAPK pathway, and especially MAPK 
p38a, have been attacked by many scientists 
seeking to discover anti-inflammatory drugs 
(184). 

Compound (81)(SB 203580), a specific in¬ 
hibitor of MAPK p38a, is a prototype for the 
pyridiny 1-imidazole compounds (185). This 
compound is active in animal models of several 
inflammatory diseases (186), but was not itself 
pursued as a clinical candidate because of its 
inhibition of other enzymes, including hepatic 
cytochrome P450 reductases. The pyridinyl- 
imidazole compounds have dissociation con¬ 
stants for MAPK p38<* in the nanomolar 
range, competing with ATP for binding to the 
enzyme. Because these compounds bind 
tightly to the unactivated enzyme, which has a 
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(81) SB203580 

low affinity for ATP, they are able to compete 
effectively even in vivo, where the ATP con¬ 
centration is in the millimolar range. The X- 
ray structures of (8 l)and several other pyridi- 
nyl-imidazole compounds in complexes with 
iuman MAPK p38a were solved in a collabo- 
•ative effort between scientists at SmithKline 
Beecham and the University of Texas (187). 
several X-ray structures of human MAPK 
)38a with and without bound inhibitors have 
ilso been solved by scientists at Vertex (181, 
. 88 ). 

The structures of the inhibited enzyme 
veie useful in understanding what parts of 
he compounds were responsible for strong 
linding to MAPK p38a. As shown in Fig. 
0.18, both hydrophobic and hydrogen bond- 
ng interactions are important components of 


the inhibitor binding pocket. This structure 
suggested that Thrl06 is an important struc¬ 
tural determinant of the selective inhibition of 
MAPK p38a by the pyrimidy 1-imidazoles, 
which have low affinity for other closely re¬ 
lated kinases. Mutation of Thrl06 results in 
the loss of sensitivity to these inhibitors, 
whereas the replacement of the corresponding 
residue in another kinase (ERK2) by threo¬ 
nine caused the mutated variant to become 
sensitive to these inhibitors (189,190). 

The X-ray structural models were also used 
at both SmithKline Beecham (later Glaxo¬ 
SmithKline) and Vertex to guide the design of 
new inhibitors. For example, both N1 of the 
central imidazole and the 2-(para-methyl- 
sulfonyl)-phenyl substituent in enzyme- 
bound (81)face a channel that opens to bulk 
solvent. This observation led to the design of 
(82) (VK19911) at Vertex (181) and (83) 
(SB242253) at GlaxoSmithKline (191). Com¬ 
pound (83)is fivefold more potent than (81 )/n 
vivo, in a mouse disease model, and was ad¬ 
vanced into human clinical trials for treat¬ 
ment of rheumatoid arthritis (192). The piper¬ 
idine on N1 of (82) and (83)was designed to 
form a salt bridge with Aspl68. This interac¬ 
tion, and the preservation of other binding in¬ 
teractions, was directly demonstrated (181) 
for compound (82). 

Analysis of the structural information from 
the X-ray models allowed the design at Vertex 
of a new scaffold for potent inhibition of 
MAPK p38a, as shown for compound (84) 
(VX-745).This design process, SBDD through 



Figure 10.18. Binding of SB203580 (shown as a ball and stick structure) in the active site of MAPK 
p38a. In addition to the side chains of the labeled residues, the protein backbone between Leul04 and 
Metl09 is shown, as wellas several aliphatic side chains and a water molecule (redsphere). Hydrogen 
bonds (dotted lines) are shown between the backbone amide of Metl09 and the inhibitor's pyrimidi- 
nyl nitrogen, and between the e-amino of Lys53 and the inhibitor's imidazole N3. This figure is based 
on the PDB coordinate set 1A9U (187). See color insert. 
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(83) SB242253 

use of a crystal structure of MAPK p38a to 
design potent inhibitors with potential utility 
as human therapeutics, is the subject of an 
international patent application by Vertex, 
published in 2000 (193). The binding mode for 
(84) has not been disclosed, but the compound 
was advanced into clinical trials (194). Vertex 
has since discontinued the clinical trials of 
(84) because of the potential for toxicity, based 
on animal data, but in mid-2002 Vertex began 
a phase I clinical study of a new compound 
targeted against MAPK p38a. 

Scientists at Boehringer Ingelheim re¬ 
cently described (195,196) their discovery of 
an orally active inhibitor of MAPK p38a, com¬ 
pound (85)(BIRB-796), that is very different 



(85) BIRB-796 


from earlier inhibitors. This compound, whose 
K d for MAPK p38a is 100 picomolar, has en¬ 
tered phase II clinical trials for treatment cf 
rheumatoid arthritis. The lead compound that 
led to compound (85) was a diaryl urea origi¬ 
nally identified by high throughput screening. 
X-ray structural studies revealed novel modes 
of binding for both the lead compound and 
(85) in the active site of MAPK p38a. Their 
binding sites are adjacent to the active site but 
do not directly overlap with that of ATP; 
rather, their binding mode changes the con¬ 
formation of MAPK p38a such that ATP can¬ 
not bind. The optimization of the lead com¬ 
pound to clinical candidate (85) was an 
iterative process using clever synthetic chem¬ 
ical design, biochemical assays for affinity, X- 
ray crystallographic studies of key complexes, 
and cell-based and animal models. The devel¬ 
opment of (85) as a MAPK p38a inhibitor with 
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efficacy in vivo makes it evident that there are 
multiple ways to effectively inhibit this en¬ 
zyme. 

2.8.2 Purine Nucleoside Phosphorylase. 

Purine nucleoside phosphorylase (PNP) cata¬ 
lyzes the reversible phosphorolysis of purine 
nucleosides to the purine base and ribosyl or 
2-deoxyribosyl-a-1-phosphate. 

The vital role of PNP in the proliferation of 
T-cells is evident from the fact that people 
with an inherited deficit in this activity have 
30- to 100-fold lower numbers of T-lympho- 
cytes than normal (197). The accumulation of 
dGTP and the resulting inhibition of ribonu¬ 
cleotide reductase in PNP-deficient T-cells 
causes the suppression of T-cell proliferation. 
B-lymphocytes are unaffected. Hence, small 
molecule inhibitors of PNP could be used to 
treat T-cell lymphomas and other T-cell-me- 
diated diseases such as psoriasis. Adjunct 
therapy with PNP inhibitors could also block 
the catabolism of therapeutically useful nucle¬ 
oside analogs. 

Human PNP is a homotrimer of 32-kDa 
subunits. The X-ray structures of the apoen- 
zyme and some substrate analog complexes 
were described in 1990. Each of the three iden¬ 
tical active sites, located near the subunit in¬ 
terfaces, are composed primarily of residues 
from one subunit, with Phel59 participating 
in the active site of the adjacent subunit (198). 


Scientists at BioCryst, CIBA-Geigy, Southern 
Research Institute, and the University of Ala¬ 
bama collaborated to design inhibitors of hu¬ 
man PNP (199,200). The project used an iter¬ 
ative process, in which new compound design 
was guided by synthetic considerations, com¬ 
puter graphics analysis of X-ray structural 
models, computational (Monte Carlo and en¬ 
ergy minimization) methods, and the inhibi¬ 
tory potency of the compounds against PNP in 
vitro. Evaluation of the most potent inhibitors 
by use of cell-based assays, followed by phar¬ 
macokinetic and pharmacological character¬ 
ization of several inhibitors in animal models, 
led to the choice of ( 86 ) for advancement into 
clinical trials. Compound (86) (BCX-34, pelde- 
sine) is being evaluated for treatment of psori¬ 
asis and skin cancer (201,202). 
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In the SBDD project that produced (86), 
the design work was initiated through use of 
the X-ray structure of the PNP apoenzyme, 
but was more successful when the structures 
of the PNP-guanine complex and other com¬ 
plexes were available (199). The PNP-gua¬ 
nine crystal structure showed no important 
interactions with N9, and indicated a poten¬ 
tial for hydrophobic interaction in the vicinity 
of the substrate ribose (Fig. 10.19). To test 
this, the 9-deaza compound (87) was synthe- 



(87) 


sized. This was a weak PNP inhibitor (mea¬ 
sured IC 50 “ 1 yM). 

The X-ray structure of the complex be¬ 
tween PNP and (87) showed that the hydro- 
phobic interaction dominated the binding 
mode, and resulted in the disruption of the 
hydrogen bonding interactions seen in the 
guanine complex (i.e., Fig. 10.19). To increase 
the spacing between the hydrophobe and the 
purine mimetic, compounds (88) and (89) 



(89) 


were made. These had affinities for PNP in the 
nanomolar range. X-ray crystallographic anal¬ 
ysis indicated new hydrogen bonding interac¬ 
tions with these 9-deaza compounds (shown 
for 89 in Fig. 10.20), made possible because N7 
is protonated. 




Figure 10.19. Binding interactions 
in the active site for the complex be¬ 
tween guanine and PNP. 


Phe200 


Phe156 
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Lys244 


Asn243 



Figure 10.20. Binding interactions 
in the active site for the complex be¬ 
tween PNP and (89). 


While this work was under way, a Phase I 
clinical trial was undertaken of PNP inhibitor 
(90) (PD-119229), which was developed by 


0 



(90) 


other workers. This led to an exploration of a 
series of 8-mino, 9-deaza derivatives, al¬ 
though the hydrogen bonding for the simpler 
9-deaza compounds turned out to be superior 
(reviewed in Ref. 202). It may be that com¬ 
pounds such as (90) suffer from unfavorable 
steric interactions between the 8-amino group 
and the proximal methyl group of Thr242 of 
the enzyme, or that the energetic cost of dehy¬ 
drating the 8-amino group cannot be fully re¬ 
paid by interactions with the enzyme. Other 
chemical series were also explored, but com¬ 
pound (86) had an acceptable safety profile 


and superior solubility and pharmacokinetic 
properties, and so was advanced into human 
testing. 

2.9 Conclusions and Lessons Learned 

The projects in which SBDD has been applied 
to enable the discovery of new drugs and clin¬ 
ical candidates have provided significant les¬ 
sons for future investigators. Some of these' 
lessons learned are summarized here. Much of 
the credit for the summary presented here be¬ 
longs to Michael Varney of Agouron, who pro¬ 
vided a copy of a presentation that he made in 
1998 to a medicinal chemistry symposium con¬ 
cerning the lessons learned in 10 years of use 
of SBDD methods. 

Experience Matters. In every aspect of 
SBDD, as in all technical fields, there is no 
substitute for experience. Given the variety of 
different techniques that must be incorpo¬ 
rated, this means that experience from several 
different people will be needed for optimal 
function of a discovery project team. Essential 
expertise is needed in X-ray crystallographic 
studies, graphical display of experimental re¬ 
sults, initial and iterative design of com¬ 
pounds and synthetic tactics, the creation of 
databases and database queries, and the anal¬ 
yses of search outputs and of the results of 
computational simulation experiments. 
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Combine and Integrate Technologies. Dedi¬ 
cated molecular biology and protein chemistry 
personnel and equipment are essential for 
identifying the right constructs for crystalliza¬ 
tion and to the assurance of a steady supply of 
protein. Synthetic chemists trained in graphi¬ 
cal analysis of protein structures tend to be 
excellent designers, and will be unlikely to de¬ 
sign molecules that they cannot make. Early 
tactical integration of the synthetic ap¬ 
proaches is even more important if combina¬ 
torial chemistry is part of the program. The 
structural information can be used to design 
combinatorial libraries as effectively as it can 
to design molecules one at a time. The use of 
libraries can compensate for the inaccuracies 
inherent in current computational scoring al¬ 
gorithms. More significantly, the integration 
of orthogonal technologies will stimulate cre¬ 
ative thought and yield much more than the 
sum of the different technologies applied sep¬ 
arately. 

Go Big Early and Often. Filling active site 
space as much as possible will maximize the 
chance that a compound will be a potent inhib¬ 
itor. During compound design, it should also 
be recognized that proteins are flexible, and 
that accessible conformations are hard to pre¬ 
dict. Sometimes, larger functionality can be 
accommodated than the existing structural 
model permits. A few compounds should be 
included to probe this. These may give rise to 
an unexpected boon, such as access to a signif¬ 
icantly altered new protein conformation with 
novel sites that can be exploited in new rounds 
of design and synthesis. 

Aqueous Solubility is Critical to Success. 
Both early in SBDD and later on in clinical 
development, sufficient aqueous solubility is 
critical. Solubility is important early because 
the concentrations of compounds must be 
high during crystallization experiments to sat¬ 
urate the high levels of protein. The ratio of 
the solubility to the inhibition constant of a 
compound is also critical to the success of the 
crystallization experiment. Once some struc¬ 
tural information becomes available, both pa¬ 
rameters can be manipulated, but usually, sol¬ 
uble inhibitors must be available before the 
availability of structural information. Solubil¬ 
ity matters during animal testing and later in 


development because compounds with very 
low solubility have limited or variable bio¬ 
availability. 

Binding Sites Can Be Filled Many Ways. 
More than one small molecule scaffold can 
provide the necessary and sufficient hydro- 
phobic and polar complementarity to generate 
potent inhibition. Sometimes, there are many 
scaffolds that will work. However, the struc¬ 
tures of complexes with all the different scaf¬ 
folds will likely have common features that 
are distinct from the structure of the apo en¬ 
zyme, attributed to large-scale conformational 
changes that occur upon binding any ligand. 
The most useful X-rav models to use for the 

V 

design of new compounds will be those that 
already have some substrate or inhibitor 
bound. There are several ways to design these 
compounds: modification of existing inhibi¬ 
tors, de novo creation of novel inhibitors, or 
some combination of these methods. 

Not All Inhibitors Are Drugs. Having the X- 
ray structure of the target protein, or even 
having used the solved structure to design a 
potent inhibitor, is only the beginning of solv¬ 
ing the difficult problems of drug design. The 
use of structure to create potent inhibitors can 
certainly shorten the time to get compounds 
into human testing, but use of SBDD methods 
does not guarantee that a potent compound 
will become a drug. This is an old lesson, actu¬ 
ally, but is forgotten at great cost. 

Structure of Free Inhibitor Is Important. De¬ 
solvation of the free ligand and of the protein's 
active-site groups upon complex formation are 
both significant. Both enthalpic and entropic 
contributions to the binding energy must be 
considered. Particular attention should be 
paid to the advantage that can be gained from 
“preorganization" of the inhibitor before 
binding, that is, low energy conformers bind 
with greater apparent avidity. 

Bound Water Is Special, But Not All Hydro¬ 
gen Bonds Are Created Equal. Each of the 
tightly bound waters present in an X-ray 
structural model has a uniaue environment 

M. 

and a unique function. In some cases, libera¬ 
tion of a bound water molecule by displacing it 
with an inhibitor's functionality can greatly 
increase inhibitor affinity, although this is not 
globally applicable. The entropic advantage of 
releasing a bound water into bulk solvent does 
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not always exceed the enthalpic cost of the dis¬ 
placement. In many situations, the preferred 
solution will be to retain a water molecule and 
use it to maximize inhibitor binding. For ex¬ 
ample, a water molecule that donates two hy¬ 
drogen bonds and accepts one cannot be isos- 
terically replaced. Electrostatic interactions 
that are more complex than hydrogen bonds 
and simple ion pairs are very difficult to 
model, anticipate, and exploit in inhibitor de¬ 
sign. 

Retain Potency While Addressing Other Is¬ 
sues Structural information can be very use¬ 
ful in designing compounds that are not part 
of a competitor's intellectual property, or that 
cannot be patented because of information in 
the public domain. Redesign of a compound 
that is not itself proprietary, by use of struc¬ 
tural information obtained with that com¬ 
pound, can yield valuable new proprietary 
molecules. Structural information can also 
guide the modification of physicochemical, 
metabolic, or pharmacological properties or 
target selectivity without compromising the 
potency against the primary therapeutic tar¬ 
get. 

All Models Are Wrong; Some Are Useful. At 
present, it is impossible to calculate an accu¬ 
rate value for a binding constant on an abso¬ 
lute scale. However, accurately estimating the 
relative binding of a series of closely related 
compounds is possible, and is much more 
likely to be successful if X-ray structures of 
target complexes with some of the compounds 
are available. Thus, although there is much 
room for improvement, local computational 
models can sometimes be quite useful. Even in 
the absence of an experimentally determined 
X-ray structure of the target, a hypothetical 
model can be a powerful tool for the design of 
useful compounds (e.g., captopril and 
TAK-147). 

Iterative SBDD Cycles Are Optimal. Small 
alterations in ligand structure often cause ma¬ 
jor changes in binding mode, protein confor¬ 
mation, or both. These changes can go unde¬ 
tected if the structural effects are not analyzed 
by X-ray analysis iteratively or too infre¬ 
quently. This can yield confusing or mislead¬ 
ing structure-activity relationships, leading to 
a waste of precious time. Moreover, changes in 
compound structure seldom affect only one 


variable, so multiple orthogonal methods 
should be used to assess the effects of changes. 
It is also important during the rational design 
process to include room for serendipity. Do not 
reject an idea for a new compound that seems 
to make intuitive sense based on a single crys¬ 
tal structure or computational calculation. 

REFERENCES 

1. D. J. Abraham, Intra-Sci. Chem. Rep., 8, 4 
(1974). 

2. http://www.agouron.com/ 

3. http://www.stromix.com/ 

4. http://www.astex-technology.com 

5. http://www.accelrys.com/consortia/htc/ 

6. P. J. Goodford, J. Med. Chem., 27,557 (1984). 

7. C. R. Beddell, Ed., The Design of Drugs to Mac- 
romolecular Targets, John Wiley & Sons, 
Chichester, UK, 1992. 

8. J. Greer, J. W. Erickson, J. J. Baldwin, and 
M. D. Varney, J. Med. Chem., 37, 1035-1054 
(1994). 

9. R. E. Babine and S. L. Bender, Chem. Rev., 97, 
1359(1997). 

10. P. Veerapandian, Ed., Structure-Based Drug 
Design, Marcel Dekker, New York, 1997. 

11. R. T. Borchardt, R. M. Freidinger, T. K. Saw¬ 
yer, and P. L. Smith, Eds., Integration of Phar¬ 
maceutical Discovery and Development. Case 
Histories (Pharmaceutical Biotechnology, 
Band 11), Plenum Press, New York, 1998. 

12. K. Gubernator and H.-J. Bohm, Eds., Struc¬ 
ture-Based Ligand Design, Wiley-VCH, New 
York/Weinheim, 1998. 

13. C. L. Nobbs, H. C. Watson, and J. C. Kendrew, 
Nature, 209,339 (1966). 

14. M. F. Perutz, Nature, 228, 726 (1970). 

15. R. C. Ladner, E. J. Heidner, and M. F. Perutz, 
J. Mol. Biol., 114,385 (1977). 

16. G. Fermi, M. F. Perutz, B. Shaanan, and R. 
Fourme, J. Mol. Biol., 175, 159 (1984). 

17. B. C. Wishner, K. B. Ward, E. E. Lattman, and 
W. E. Love, J. Mol. Biol., 98, 179 (1975). 

18. D. J. Harrington, K. Adachi, and W. E. Royer 
Jr., J. Mol. Biol., 272, 398 (1997). 

19. C. R. Beddell, P. J. Goodford, G. Kneen, R. D. 
White, S. Wilkinson, and R. Wootton, Br. J. 
Pharmacol., 82,397 (1984). 

20. M. Merrett, D. K. Stammers, R. D. White, R. 
Wootton, and G. Kneen, Biochem. J.,239,387 
(1986). 



464 


Structure-Based Drug Design 


21. F. C. Wireko and D. J. Abraham, Proc. Natl. 
Acad. Sci. USA , 88,2209 (1991). 

22. D. J. Abraham, A S. Mehanna, F. C. Wireko, 
E. P. Orringer, J. Whitney, and R. P. Thomas, 
Blood, 77, 1334 (1991). 

23. M. K. Safo, S. Nokuri, and D. J. Abraham, Un¬ 
published results. 

24. P. E. Kennedy, F. L. Williams, and D. J. Abra¬ 
ham, J .Med. Client., 27,103 (1984). 

25. D. J. Abraham, M. F. Perutz, and S. E. V. Phil¬ 
lips, Proc. Natl. Acad. Sci. USA, 80,324 (1983). 

26. M. F. Perutz, G. Fermi, D. J. Abraham, C. Po¬ 
yart, and E. Bursaux, J. Am. Chem. Soc., 108, 
1064 (1986). 

27. E. P. Orringer, D. S. Blythe, J. A Whitney, S. 
Brockenbrough, and D. J. Abraham, Am. J. 
Hematol., 39, 39(1992). 

28. D. J. Abraham, A. S. Mehanna, F. Williams, 
E. J. Cragoe Jr., and O. W. Woltersdorf Jr., 
J .Med. Chem., 32,2460 (1989). 

29. D. J. Abraham, P. E. Kennedy, A. S. Mehanna, 
D. Patwa, and F. L. Williams, J. Med. Chem., 
27,967 (1984). 

30. A. Arnone, Nature, 237, 146 (1972). 

31. V. Richard, G. G. Dodson, and Y. Mauguen, J, 
Mol. Biol., 233,270 (1993). 

32. P. J. Goodford, J. St-Louis, and R. Wootton, 
Br. J. Pharmacol., 68, 741 (1980). 

33. C. R. Beddell, P. J. Goodford,F. E. Norrington, 
S. Wilkinson, and R. Wootton, Br. J. Pharma¬ 
col., 57,201(1976). 

34. F. F. Brown and P. J. Goodford, fir. J. Phar¬ 
macol., 60,337 (1977). 

35. A S. Mehanna and D. J. Abraham, Biochemis¬ 
try, 29,3944 (1990). 

36. M. F. Perutz and C. Poyart, Lancet, 2, 881 
(1983). 

37. I. Lalezari and P. Lalezari, J. Med. Chem., 32, 
2352 (1989). 

38. E Lalezari, P. Lalezari, C. Poyart, M. Marden, 
J. Kister, B. Bohn, G. Fermi, and M. F. Perutz, 
Biochemistry, 29,1515 (1990). 

39. D. J. Abraham, R. S. Randad, M. A. Mahran, 
and A. S. Mehanna, J. Med. Chem., 34, 752 
(1991). 

40. D. J. Abraham, F. C. Wireko, R. S. Randad, C. 
Poyart, J. Kister, B. Bohn, J. F. Leard, and 
M. P. Kunert, Biochemistry, 31,9141 (1992). 

41. F. C. Wireko,G. E. Kellogg,andD. J. Abraham, 
J .Med. Chem., 34,758 (1991). 

42. D. J. Abraham, J. Kister, G. S. Joshi, M. C. 
Marden, and C. Poyart, J. Mol. Biol., 248,845 
(1995). 


43. M. K. Safo, C. M. Moure, J. C. Burnett, G. S. 
Joshi, and D. J. Abraham, Protein Sci., 10,951 
( 2001 ). 

44. S. K. Burley and G. A Petsko, FEBS Lett., 201, 
751(1986). 

45. S. K. Burley and G. A. Petsko, Science, 229,23 
(1985). 

46. M. Levitt and M. F. Perutz, J .Mol. Biol., 201, 
751 (1988). 

47. D. J. Abraham, M. K. Safo, T. Boyiri, R. E. 
Danso-Danquah, J. Kister, and C. Poyart, Bio¬ 
chemistry, 34,15006 (1995). 

48. M. P. Grella, R. Danso-Danquah, M. K Safo, 
G. S. Joshi, J. Kister, S. J. Hoffman, M. Mar¬ 
den, and D. J. Abraham, J. Med. Chem., 25, 
4726 (2001). 

49. A.M. Youssef, M. K. Safo, R. Danso-Danquah, 
G. S. Joshi, J. Kister, M. Marden, and D. J. 
Abraham, J. Med. Chem., 45,1184 (2002). 

50. J. A Walder, R. H. Zaugg, R. Y. Walder, J. M 
Steele, and I. M. Klotz, Biochemistry, 18,4265 
(1979). 

51. R. Chatterjee, E. V. Welty, R. Y. Walder, S. L. 
Pruitt, P. H. Rogers, A Arnone, and J. A. 
Walder, J. Biol. Chem., 261,9929 (1986). 

52. S. R. Snyder, E. V. Welty, R. Y. Walder, L. A 
Williams, and J. A Walder, Proc. Natl. Acad. 
Sci. USA, 84,7280 (1987). 

53. N. Komiyama, J. Tame, and K. Nagai, Biol. 
Chem., 377,543 (1996). 

54. T. Boyiri, M. K. Safo, R. E. Danso-Danquah, J. 
Kister, C. Poyart, and D. J. Abraham, Bio¬ 
chemistry, 34,15021 (1995). 

55. M. F. Perutz, fir. Med. Bull., 32,195 (1976). 

56. J. Monod, J. Wyman, and J.-P. Changeux, J . 
Mol. Biol., 12, 88 (1965). 

57. D. A Matthews, R. A. Alden, J. T. Bolin, S. T. 
Freer, R. Hamlin, N. Xuong, J. Kraut, M. Poe, 
M. Williams, and K. Hoogsteen, Science, 197, 
452 (1977). 

58. L. F. Kuyper, B. Roth, D. P. Baccanari, R. Fer- 
one, C. R. Beddell, J. N. Champness, D. K. 
Stammers, J. G. Dann, F. E. Norrington, D. J. 
Baker, and P. J. Goodford, J. Med. Chem., 25, 
1120 (1982). 

59. D. A. Matthews, J. T. Bolin, J. M. Burridge, 
D. J. Filman, K. W. Volz, and J. Kraut, J. Biol. 
Chem., 260,392 (1985). 

60. K. Appelt, R. J. Bacquet, C. A. Bartlett, C. L. J. 
Booth, S. T. Freer, M. A. Fuhry, M. R. Gehring, 

S. M. Herrmann, E. F. Howland, C. A Janson, 

T. R. Jones, C. C. Kan, V. Kathardekar, K. K. 
Lewis, G. P. Marzoni, D. A. Matthews, C. 



465 


Mohr, E. W. Moomaw, C. A. Morse, S. J. Oat- 
ley, R C. Ogden, M. R. Reddy, S. H. Reich, 
W. S. Schoettlin, W. W. Smith, M. D. Varney, 
J. E. Villafranca, R. W. Ward, S. Webber, S. E. 
Webber, K. M. Welsh, and J. White, J. Med. 
Chem., 34,1925 (1991). 

61. S. H. Reich and S. E. Webber, Perspect. Drug 
Discov. Des., 1, 371-390 (1993). 

62. L. W. Hardy, J. S. Finer-Moore, W. R. Mont- 
fort, M. O. Jones, D. V. Santi, and R. M. Stroud, 
Science, 235,448-455(1987). 

63. D. A Matthews, K. Appelt, S. J. Oakley, and 
N. H. Xuong, J. Mol. Biol., 214, 923-936 
(1990). 

64. W. R. Montfort, K. M. Perry, E. B. Fauman, 
J. S. Finer-Moore, G. F. Maley, L. Hardy, F. 
Maley, and R. M. Stroud, Biochemistry, 29, 

6964-6977 (1990). 

65. Y.Takemura and A L. Jackman, Anticancer 
Drugs, 8 , 3-16 (1997). 

66. S. E. Webber, T. M. Bleckman, J. Attard, J. G. 
Deal, V. Kathardekar, K. M. Welsh, S. Webber, 
C. A. Janson, D. A. Matthews, W. W. Smith, 
S. T. Freer, S. R. Jordan, R. J. Bacquet, E. F. 
Howland, C. L. J. Booth, R. W. Ward, S. M. 
Herrmann, J. White, C. A Morse, J. A. 
Hilliard, and C. A Bartlett, J.Med. Chem., 36, 
733-746 (1993). 

67. I. Niculescu-Duvaz, Curr. Opin. Invest. Drugs, 

2, 693-705 (2001). 

68. P. J. Goodford, J.Med. Chem., 28, 849(1985). 

69. P. Goodford, J. Chemom., 10,107(1996). 

70. M.D. Varney, G. P. Marzoni, C. L. Palmer, 
J. G. Deal, S Webber, K. M. Welsh, R. J. Bac¬ 
quet, C. A Bartlett, C. A. Morse, C. L. Booth, 
S. M. Herrmann, E. F. Howland, R. W. Ward, 
and J. White, J. Med. Chem., 35, 663-676 
(1992). 

71. D. R Newell, Semin. Oncol., 26 (Suppl. 6), 

74-81 (1999). 

72. P. Norman, Curr. Opin. Invest. Drugs, 2, 

1611-1622 (2001). 

73. E. C. Taylor, Adv. Exp. Med. Biol., 338, 387- 
408 (1993). 

74. G. P. Beardsley, B. A Moroson, E. C. Taylor, 
and R. G. Moran, J.Biol. Chem., 264,328333 
(1989). 

75. J. R. Piper, G. S. McCaleb, J. A. Montgomery, 

R. L. Kisliuk, Y. Gaumont, J. Thorndike, and 
F. M. Sirotnak, J.Med. Chem., 31,2164-2169 
(1988). 

76. S. E. Greasley, T. H. Marsilje, H. Cai, S. Baker, 

S. J. Benkovic, D. L. Boger, and I. A Wilson, 
Biochemistry, 40,13538-13547(2001). 


77. R. J. Almassy, C. A. Janson, C. C. Kan, and Z. 
Hostomska, Proc. Natl. Acad. Sci. USA, 89, 

6114-6118 (1992). 

78. C. C. Kan, M. R. Gehring, B. R Nodes, C. A 
Janson, R. J. Almassy, and Z. Hostomska, J. 
Protein Chem., 11,467-473 (1992). 

79. M. D. Varney, C. L. Palmer, W. H. Romines 
3rd, T. Boritzki, S. A. Margosiak, R. Almassy, 
C. A. Janson, C. Bartlett, E. J. Howland, and R 
Ferre, J. Med. Chem., 40,2502-2524(1997). 

80. C. Shih, L. S. Gossett, J. F. Worzalla, S. M. 
Rinzel, G. B. Grindey, P. M. Harrington, and 
E. C. Taylor, J. Med. Chem., 35, 1109-1116 
(1992). 

81. D. W. Cushman, H. S. Cheung, E. F. Sabo, and 
M. A Ondetti, Biochemistry, 16,5484 (1977). 

82. M. A. Ondetti, B. Rubin, and D. W. Cushman, 
Science, 196, 441 (1977). 

83. M. J. Wyvratt and A A. Patchett, Med. Res. 

Rev., 5,483-531(1985). 

84. D. W. Cushman and M. A. Ondetti, Hyperten¬ 
sion, 17,589 (1991). 

85. D. W. Cushman and M. A. Ondetti, Nat. Med., 

5,1110(1999). 

86. J. Rahuel, V. Rasetti, J. Maibaum, H. Rueger, 
R. Goschke, N. C. Cohen, S. Stutz, F. Cumin, 
W. Fuhrer, J. M. Wood, and M. G. Grutter, 
Chem. Biol., 7, 493-504 (2000). 

87. L. D. Byers and R. Wolfenden, Biochemistry, 
12,2070-2078(1973). 

88. J.R Huff and J. Kahn, Adv. Protein Chem., 56, 

213-251 (2001). 

89. A Wlodawer and J. Vondrasek, Annu. Rev. 
Biophys. Biomol. Struct., 27,249(1998). 

90. T. D. Meek, J. EnzymeInhib., 6, 65 (1992). 

91. A Wlodawer and J. W. Erickson, Annu. Rev. 
Biochem., 62,543(1993). 

92. R. Lapatto, T. Blundell, A Hemmings, J. Over- 
ington, A. Wilderspin, S. Wood, J. R. Merson, 
P. J. Whittle, D. E. Danley, K. F. Geoghegan, et 
al., Nature, 342,299302(1989). 

93. M. A. Navia, P. M. Fitzgerald, B. M. McKeever, 
C. T. Leu, J. C. Heimbach, W. K. Herber, I. S. 
Sigal, P. L. Darke, and J. P. Springer, Nature, 
337,615-620(1989). 

94. A Wlodawer, M. Miller, M. Jaskolski, B. K. 
Sathyanarayana, E. Baldwin, I. T. Weber, 
L. M. Selk, L. Clawson, J. Schneider, and S. B. 
Kent, Science, 245, 616-621 (1989). 

95. I. B. Duncan and S. Redshaw, Infect. Dis. 
Ther., 25, 27-47 (2002). 

96. A G. Tomasselli, M. K. Olsen, J. O. Hui, D. J. 
Staples, T. K. Sawyer, R. L. Heinrikson, and 
C. S. Tomich, Biochemistry, 29, 264-269 
(1990). 



466 


Structure-Based Drug Design 


97. M. Jaskolski, A G. Tomasselli, T. K. Sawyer, 
D. Gf. Staples, R. L. Heinrikson, J. Schneider, 
S. B. Kent, and A. Wlodawer, Biochemistry, 30, 

1600-1609 (1991). 

98. M. W. Holladay, F. G. Salituro, and D. H. Rich, 
J. Med. Chem., 30, 374-383 (1987). 

99. F. G. Salituro, N. Agarwal, T. Hofmann, and 
D. H. Rich, J .Med. Chem., 30,286-295 (1987). 

100. D. J. Kempf, K. C. Marsh, D. A Paul, M. F. 
Knigge, D. W. Norbeck, W. E. Kohlbrenner, L. 
Codacovi, S. Vasavanonda, P. Bryant, X. C. 
Wang, N. E. Wideburg, J. J. Clement, J. J.Platt- 
ner, and J. Erickson, Antimicrob. Agents Che- 
mother., 35,2209-2214(1991). 

101. M. V. Hosur, N. T. Bhat, D. J. Kempf, E. T. 
Baldwin, B. Liu, S. Gulnik, N. E. Wideburg, 
D. W. Norbeck, K Appelt, and J. W. Erickson, 
J .Am. Chem. Soc., 116, 847-855 (1994). 

102. D. J. Kempf, H. L. Sham, K C. Marsh, C. A. 
Flentge, D. Betebenner, B. E. Green, E. Mc¬ 
Donald, S. Vasavanonda, A Saldivar, N. E. 
Wideburg, W. M. Kati, L. Ruiz, C. Zhao, L. 
Fino, J. Patterson, A. Molla, J. J. Plattner, and 

D. W. Norbeck, J. Med. Chem., 41, 602-617 
(1998). 

103. C. N. Hodge, P. E. Aldrich, L. T. Bacheler, 
C. H. Chang, C. J. Eyermann, S. Garber, M. 
Grubb, D. A. Jackson, P. K Jadhav, B. Korant, 
P. Y. Lam, M. B. Maurin, J. L. Meek, M. J. 
Otto, M. M. Rayner, C. Reid, T. R. Sharpe, L. 
Shum, D. L. Winslow, and S. Erickson- 
Viitanen, Chem. Biol., 3,301-314 (1996). 

104. B. D. Dorsey, R. B. Levin, S. L. McDaniel, J. P. 
Vacca, J. P. Guare, P. L. Darke, J. A Zugay, 

E. A Emini, W. A. Schleif, J. C. Quintero, J. H. 
Lin, I. W. Chen, M. K Holloway, P. M. D. 
Fitzgerald,M. G. Axel,D. Ostovic, P. S. Ander¬ 
son, and J. R. Huff, J. Med. Chem., 37, 3443- 
3451 (1994). 

105. J. P. Vacca, J. P. Guare, S. J. DeSolms, W. M. 
Sanders, E. A. Giuliani, S. D. Young, P. L. 
Darke, I. S. Sigal, W. A Schleif, J. C. Quintero, 
E. A. Emini, P. S. Anderson, and J. R. Huff, 
J .Med. Chem., 34,1228-1230 (1991). 

106. T. A. Lyle, C. M. Wiscount, J. P. Guare, W. J. 
Thompson, P. S. Anderson, P. L. Darke, J. A. 
Zugay, E. A. Emini, W. A. Schleif, J. C. Quin¬ 
tero, R. A. F. Dixon, I. S. Sigal, and J. R. Huff, 
J .Med. Chem., 34,1230-1233 (1991). 

107. M. K Holloway,J. M. Wai, T. A. Halgren, P. M. 
Fitzgerald, J. P. Vacca, B. D. Dorsey, R. B. 
Levin, W. J. Thompson, L. J. Chen, S. J. 
deSolms, N. Gaffin, A. K Ghosh, E. A Giu¬ 
liani, S. L. Graham, J. P. Guare, R. W. Hun- 
gate, T. A Lyle, W. M. Sanders, T. J. Tucker, 


M. Wiggins, C. M. Wiscount, O. W. Wolters- 
dorf, S. D. Young, P. L. Darke, and J. A Zu- 
guay, J .Med. Chem., 38,305-317 (1995). 

108. E. E. Kim, C. T. Baker, M. D. Dwyer, M. A 
Murcko, B. G. Rao, R. D. Tung, and M. A. Na- 
via, J .Am. Chem. Soc., 117,1181-1182(1995). 

109. S. W. Kaldor, V. J. Kalish, J. F. Davies 2nd, 

B. V. Shetty, J. E. Fritz, K Appelt, J. A. Bur¬ 
gess, K M. Campanale, N. Y. Chirgadze, D. K. 
Clawson, B. A. Dressman, S. D. Hatch, D. A. 
Khalil, M. B. Kosa, P. P. Lubbehusen, M. A. 
Muesing, A. K Patick, S. H. Reich, K S. Su, 
and J. H. Tatlock, J. Med. Chem., 40, 3979- 
3985 (1997). 

110. M. Moledina, M. Chakir, and P. J. Gandhi, J, 
Thromb. Thrombolysis, 12,141-149 (2001). 

111. J. Hauptmann, Eur. J. Clin. Pharmacol., 57, 
751-758 (2002). 

112. D. W. Banner and P. Hadvary, J. Biol. Chem., 
266,20085-20093 (1991). 

113. P. E. Sanderson and A M. Naylor-Olsen, Curr. 
Med. Chem., 5,289 (1998). 

114. J. P. Vacca, Curr. Opin. Chem. Biol., 4, 394 
( 2000 ). 

115. J. Hauptmann, B. Kaiser, M. Paintz, and F. 
Markwardt, Biomed. Biochim. Acta, 46, 445- 
453 (1987). 

116. H. Brandstetter, D. Turk, H. W. Hoeffken, D. 
Grosse, J. Sturzebecher, P. D. Martin, B. F. 
Edwards, and W. Bode, J. Mol. Biol., 226, 
1085-1099 (1992). 

117. N. H. Hauel, H. Nar, H. Priepke, U. Reis, J. M. 
Stassen, and W. Wienen, J. Med. Chem., 45, 
1757-1766 (2002). 

118. R. M. Friedlander, V. Gagliardini, H. Hara, 
K. B. Fink, W. Li, G. MacDonald, M. C. Fish¬ 
man, A. H. Greenberg, M. A. Moskowitz, and J. 
Yuan, J. Exp. Med., 185,933-940 (1997). 

119. B. Siegmund, H. A. Lehr, G. Fantuzzi, and 

C. A. Dinarello, Proc. Natl. Acad. Sci. US A,98, 
13249-13254 (2001). 

120. N. P. Walker, R. V. Talanian, K. D. Brady, L. C. 
Dang, N. J. Bump, C. R. Ferenz, S. Franklin, T. 
Ghayur, M. C. Hackett, L. D. Hammill, L. Her¬ 
zog, M. Hugunin, W. Houy, J. A. Mankovich, L. 
McGuiness, E. Orlewicz, M. Paskind, C. A. 
Pratt, P. Reis, A. Summani, M. Terranova, 
J. P. Welch, L. Xiong, A. Moller, D. E. Tracey, 

R. Kamen, and W. W. Wong, Cell, 78,343-352 
(1994). 

121. K P. Wilson, J. A. Black, J. A. Thomson, E. E. 
Kim, J. P. Griffith, M. A. Navia, M. A Murcko, 

S. P. Chambers, R. A. Aldape, S. A. Raybuck, 
and D. Livingstone, Nature, 370, 270-275 
(1994). 



467 


122. R. Leung-Toung, W. Li, T. F. Tam, and K. Ka- 
rimian, Curr. Med. Chem.,9,979-1002 (2002). 

123. M. R. Michaelides and M. L. Curtin, Curr. 
Pharm. Des., 5, 787-819 (1999). 

124. P. D. Brown, Expert Opin. Invest. Drugs, 9, 
2167-2177 (2000). 

125. O.Santos, C. D. McDermott, R. G. Daniels, and 
K. Appelt, Clin. Exp. Metastasis, 15, 499-508 
(1997). 

126. L. J. MacPherson, E. K. Bay hurt, M. P. Cap- 
parelli, B. J. Carroll, R. Goldstein, M. R. Jus¬ 
tice, L. Zhu, S. Hu, R. A. Melton, L. Fryer, R. L. 
Goldberg, J. R. Doughty, S. Spirito, V. Blan- 
cuzzi, D. Wilson, E. M. O’Byrne, V. Ganu, and 

D. T. Parker, J. Med. Chem., 40, 2525-2532 
(1997). 

127. G. Clemens, B. Hibner, R. Humphrey, H. 
Kluender, and S. Wilhelm in N. J. Clendeninn 
and K. Appelt, Eds., Matrix Metalloproteinase 
Inhibitors in Cancer Therapy, Humana Press, 
Totowa, NJ, 2001, pp. 175-192. 

128. T. Kusumi, M. Tsuda, T. Katsunuma, and M. 
Yamamura, Cell Biochem. Fund., 7, 201-204 
(1989). 

129. M D. Sintchak, M. A. Fleming, O. Futer, S. A. 
Raybuck, S. P. Chambers, P. R. Caron, M. A. 
Murcko, and K. P. Wilson, Cell, 85, 921-930 
(1996). 

130. L. Hedstrom, Curr. Med. Chem., 6, 545-560 
(1999). 

131. M. D. Sintchak and E. Nimmesgern, Immuno- 
pharmacology, 47,163-184 (2000). 

132. D. A Gschwend, A. C. Good, and I. D. Kuntz, J. 
Mol. Recognit., 9,175-186 (1996). 

133. D. Dvornik, J. Diabetes Complications, 6, 
25-34 (1992). 

134. D. R. Tomlinson, E. J. Stevens, and L. T. Die- 
mel, Trends Pharmacol. Sci., 15, 293-297 
(1994). 

135. C. L. Kaul and P. Ramarao, Methods Find. 
Exp. Clin. Pharmacol., 23,465-475 (2001). 

136. A. Urzhumtsev, F. Tete-Favier, A. Mitschler, 
J. Barbanton, P. Barth, L. Urzhumtseva, J. F. 
Biellmann, A Podjarny, and D. Moras, Struc¬ 
ture, 5,601-612 (1997). 

137. M. C. Van Zandt, E. O. Sibley, K. J. Combs, 

E. E. McCann, B. Flam, D. J. Lavoie, D. 
Sawicki, A Sabetta, A. Carrington, J. Sredy, V. 
Calderone, B. Cuevrier, and A. Podjarny, 
Posterpresented at the 218th National Meeting 
of the American Chemical Society, New Or¬ 
leans, LA, August 22-26,1999. 

138. S. Borman, Chem. Eng. News, 80, 35-39 

( 2002 ). 


139. E. K. Perry, B. E. Tomlinson, G. Blessed, K. 
Bergrnann, P. H. Gibson, and R. H. Perry, Br. 

Med. J. ,2,1457-1459 (1978). 

140. B. P. Imbimbo, CNS Drugs, 15, 375-390 

( 2001 ). 

141. Y. Ishihara, K. Kato, and G. Goto, Chem. 
Pharm. Bull. (Tokyo), 39,3225-3235 (1991). 

142. Y. Ishihara, G. Goto, and M. Miyamoto, Curr. 
Med. Chem., 7, 341-354 (2000). 

143. J. L. Sussman, M. Harel, F. Frolow, C. Oefner, 
A. Goldman, L. Toker, and I. Silman, Science, 
253,872-879(1991). 

144. Y. Yamamoto, Y. Ishihara, and I. D. Kuntz, 
J. Med. Chem., 37,314143153 (1994). 

145. A. H. Reid, J. K. Taubenberger, and T. G. Fan¬ 
ning, Microbes Infect.,%, 81-87 (2001). 

146. J. N. Varghese, W. G. Laver, and P. M. Colman, 
Nature, 303, 35-40 (1983). 

147. M. von Itzstein, W.-Y. Wu, G. B. Kok, M. S. 
Pegg, J. C. Dyason, B. Jin, T. Van Phan, M. L. 
Smythe, H. F. White, S. W. Oliver, P. M. Col¬ 
man, J. N. Varghese, D. M. Ryan, J. M. Woods, 
R. C. Bethell, V. J. Hotham, J. M. Cameron, 
and C. R. Penn, Nature, 363,418-423(1993). 

148. P. Bossart-Whitaker, M. Carson, Y. S. Babu, 
C. D. Smith, W. G. Laver, and G. M. Air, J. Mol. 
Biol., 232,1069-1083 (1993). 

149. J. N. Varghese, V. C. Epa, and P. M. Colman, 

Protein Sci., 4,1081-1087 (1995). 

150. C. U. Kim, W. Lew, M. Williams, H. Liu, L. 
Zhang, S. Swaminathan, N. Bischofberger, 
M. S. Chen, D. Mendel, W. G. Laver, and R. C. 
Stevens, A. Am. Chem. Soc., 119,681 (1997). 

151. Y. S. Babu, P. Chand, S. Bantia, P. Kotian, A. 
Dehghani, Y. El-Kattan, T. H. Lin, T. L. 
Hutchison, A J. Elliott, C. D. Parker, S. L. 
Ananth, L. L. Horn, G. W. Laver, and J. A. 
Montgomery, J. Med. Chem., 43, 3482-3486 
( 2000 ). 

152. J. A Green, G. M. Smith, R. Buchta, R. Lee, 
K Y. Ho, I. A. Rajkovic, and K. F. Scott, In¬ 
flammation, 15, 355-367 (1991). 

153. P. Vadas, J. Browning, J. Edelson, and W. 
Pruzanski, J. Lipid Mediat., 8,1-30 (1993). 

154. C. Bennion, S. Connolly, N. P. Gensmantel, C. 
Hallam, C. G. Jackson, W. U. Primrose, G. C. 
Roberts, D. H. Robinson, and P. K. Slaich, 
J .Med. Chem., 35,2939-2951 (1992). 

155. S. Connolly, C. Bennion, S. Botterell, P. J. Cro- 
shaw, C. Hallam, K Hardy, P. Hartopp, C. G. 
Jackson, S. J. King, L. Lawrence, A. Mete, D. 
Murray, D. H. Robinson, G. M, Smith, L. Stein, 

I. Walters, E. Wells, and W. J. Withnall, 

J. Med. Chem., 45,1348-1362 (2002). 



468 


Structure-Based Drug Design 


156. H. G. Beaton, C. Bennion, S. Connolly, A R. 
Cook, N. P. Gensmantel, C. Hallam, K. Hardy, 

B. Hitchin, C. G. Jackson, and D. H. Robinson, 
J. Med. Chem., 37,557-559(1994). 

157. J. P. Wery, R. W. Schevitz, D. K. Clawson, J. L. 
Bobbitt, E. R. Dow, G. Gamboa, T. Goodson 
Jr., R. B. Hermann, R. M. Kramer, D. B. Mc¬ 
Clure, et al.. Nature, 352,79-82 (1991). 

158. R. W. Schevitz, N. J. Bach, D. G. Carlson, N. Y. 
Chirgadze, D. K. Clawson, R. D. Dillard, S. E. 
Draheim, L. W. Hartley, N. D. Jones, Mihelich, 
et al .,Nat. Struct. Biol., 2,458-465 (1995). 

159. D. L. Scott, S. P. White, J. L. Browning, J. J. 
Rosa, M. H. Gelb, and P. B. Sigler, Science, 

254,1007-1010(1991). 

160. M. M. Thunnissen, E. Ab, K. H. Kalk, J. 
Drenth, B. W. Dijkstra, O. P. Kuipers, R. Dijk- 
man, G. H. de Haas, and H. M. Verheij, Nature, 

347,689-691(1990). 

161. S. E. Draheim, N. J. Bach, R. D. Dillard, D. R. 
Berry, D. G. Carlson, N. Y. Chirgadze, D. K. 
Clawson, L. W. Hartley, L. M. Johnson, N. D. 
Jones, E. R. McKinney, E. D. Mihelich, J. L. 
Olkowski, R. W. Schevitz, A. C. Smith, D. W. 
Snyder, C. D. Sommers, and J. P. Wery, 
J. Med. Chem., 39,5159-5175 (1996). 

162. D. W. Snyder, N. J. Bach, R. D. Dillard, S. E. 
Draheim, D. G. Carlson, N. Fox, N. W. Roehm, 

C. T. Armstrong, C. H. Chang, L. W. Hartley, 
L. M. Johnson, C. R. Roman, A. C. Smith, M. 
Song, and J. H. Fleisch, J, Pharmacol. Exp. 
Ther., 288,1117-1124 (1999). 

163. D. M. Springer, Curr. Pharm. Des., 7,181-198 

( 2001 ). 

164. C. Savolainen, S. Blomqvist, M. N. Mulders, 
and T. Hovi, J. Gen. Virol., 83 (Pt 2), 333-340 
( 2002 ). 

165. M. G. Rossmann, Viral Immunol., 2,143-161 

(1989). 

166. M. G. Rossmann, E. Arnold, J. W. Erickson, 
E. A. Frankenberger, J. P. Griffith, H.-J. 
Hecht, J. E. Johnson, G. Kamer, M. Luo, A. G. 
Mosser, R. R. Rueckert, B. Sherry, and G. 
Vriend, Nature, 317,145-153 (1985). 

167. G. D. Diana, M. A McKinlay, M. J. Otto, V. 
Akullian, and C. Oglesby, J. Med. Chem., 28, 

1906-1910 (1985). 

168. G. D. Diana, M. A McKinlay, C. J. Brisson, 
E. S. Zalay, J. V. Miralles, and U. J. Salvador, 
J. Med. Chem., 28,748-752 (1985). 

169. M. J. Otto, M. P. Fox, M. J. Fancher, M. F. 
Kuhrt, G. D. Diana, and M. A McKinlay, An- 
timicrob. Agents Chemother., 27, 883-886 
(1985). 


170. M. P. Fox, M. J. Otto, and M. A. McKinlay, 
Antimicrob. Agents Chemother., 30, 110-116 
(1986). 

171. B. Jubelt, A. K. Wilson,S. L. Ropka, P. L. Guid- 
inger, and M. A. McKinlay, J.Infect. Dis., 159, 
866-871 (1989). 

172. T. J. Smith, M. J. Kremer, M. Luo, G. Vriend, 
E. Arnold, G. Kamer, M. G. Rossmann, M. A. 
McKinlay, G. D. Diana, and M. J. Otto, Sci¬ 
ence, 233,1286-1293(1986). 

173. D. C. Pevear, M. J. Fancher, P. J. Felock, M. G. 
Rossmann, M. S. Miller, G. D. Diana, A M. 
Treasurywala, M. A McKinlay, and F. J. 
Dutko, J. Virol., 63, 2002-2007 (1989). 

174. K. H. Kim, P. Willingmann, Z. X. Gong, M. J. 
Kremer, M. S. Chapman, I. Minor, M. A. Ol¬ 
iveira, M. G. Rossmann, K. Andries, G. D. Di¬ 
ana, F. J. Dutko, M. A. McKinlay, and D. C. 
Pevear, J. Mol. Biol., 230, 206-227 (1993). 

175. G. D. Diana, D. Cutcliffe, R. C. Oglesby, M. J. 
Otto, J. P. Mallamo, V. Akullian, and M. A 
McKinlay, J. Med. Chem., 32,450-455 (1989). 

176. G. D. Diana and D. C. Pevear, Antiviral Chem. 
Chemother., 8,401(2002). 

177. G. D. Diana, P. Rudewicz, D. C. Pevear, T. J. 
Nitz, S. C. Aldous, D. J. Aldous, D. T. Robin¬ 
son, T. Draper, F. J. Dutko, C. Aldi, et al., 
J. Med. Chem., 38,1355-1371 (1995). 

178. J. M. Rogers, G. D. Diana, and M. A. McKinlay, 
Adv. Exp. Med. Biol., 458, 69-76 (1999). 

179. F. G. Hayden, T. Coats, K. Kim, H. A. Ha^s- 
man, M. M. Blatter, B. Zhang, and S. Liu, An¬ 
tiviral Ther., 7,53-65 (2002). 

180. B. Derijard, J.Raingeaud, T. Barrett, I.-H. Wu, 
J. Han, R. J. Ulevitch, and R. J. Davis, Science, 

267,682-685 (1995). 

181. K. P. Wilson, P. G. McCaffrey, K. Hsiao, S. 
Pazhanisamy, V. Galullo, G. W. Bemis, M. J. 
Fitzgibbon, P. R. Caron, M. A Murcko, and 
M. S. Su, Chem. Biol., 4,423-431 (1997). 

182. B. Frantz, T. Klatt, M. Pang, J. Parsons, A 
Rolando, H. Williams, M. J. Tocci, S. J. 
O’Keefe, and E. A. O’Neill, Biochemistry, 37, 

13846-13853 (1998). 

183. J. C. Lee, J. T. Laydon, P. C. McDonnell, T. F. 
Gallagher, S. Kumar, D. Green, D. McNulty, 
M. J. Blumenthal, J. R. Heys, S. W. Landvat- 
ter, J. E. Strickler, M. M. McLaughlin, I. R. 
Siemens, S. M. Fisher, G. P. Livi, J. R. White, 
J. L. Adams, and P. R. Young, Nature, 372, 
739-746 (1994). 

184. J. C. Lee, S. Kumar, D. E. Griswold, D. C. Un¬ 
derwood, B. J. Votta, and J. L. Adams, Immu- 
nopharmacology, 47,185-201 (2000). 


References 


469 


185. A. Cuenda, J. Rouse, Y. N. Doza, R. Meier, P. 
Cohen, T. F. Gallagher, P. R. Young, and J. C. 
Lee, FEBS Lett., 364,229-233(1995). 

186. A. M Badger, J. N. Bradbeer, B. Votta, J. C. 
Lee, J. L. Adams, and D. E. Griswold, J.Phar¬ 
macol. Exp. Ther., 279, 1453-1461 (1996). 

187. Z. Wang, B. J. Canagarajah, J. C. Boehm, S. 
Kassisa, M. H. Cobb, P. R. Young, S. Abdel- 
Meguid, J. L. Adams, and E. J. Goldsmith, 
Structure, 6,1117-1128(1998). 

188. K. P. Wilson, M. J. Fitzgibbon, P. R. Caron, 
J. P. Griffith, W. Chen, P. G. McCaffrey, S. P. 
Chambers, and M. S. Su, J. Biol. Chem., 271, 
27696-27700 (1996). 

189. T. Fox, J. T. Coll, X. Xie, P. J. Ford, U. A 
Germann, M. D. Porter, S. Pazhanisamy, M. A. 
Fleming, V. Galullo, M. S. Su, and K. P. Wil¬ 
son, Protein Sci., 7,2249(1998). 

190. R. J. Gum, M. M. McLaughlin, S. Kumar, Z. 
Wang, M. J. Bower, J. C. Lee, J. L. Adams, G. P. 
Livi, E. J. Goldsmith, and P. R. Young, J. Biol. 
Chem., 273,15605-15610 (1998). 

191. J. L. Adams, J. C. Boehm, T. F. Gallagher, S. 
Kassis, E. F. Webb, R. Hall, M. Sorenson, R. 
Garigipati, D. E. Griswold, and J. C. Lee, 
Bioorg. Med. Chem. Lett., 11, 2867-2870 
( 2001 ). 

192. T. Fullerton, A. Sharma, U. Prabhakar, M. 
Tucci, S. Boike, H. Davis, D. Jorkasky, and W. 
Williams, Clin. Pharmacol. Ther., 67, 114 

( 2000 ). 

193. Pat. Appl. Vertex Pharmaceuticals, Inc., as¬ 
signee, PCT WO 00/36096 (2000). 


194. J. J. Haddad, Curr. Opin. Invest. Drugs, 2, 
1070 (2001). 

195. C. Pargellis, L.Tong,L. Churchill, P. F. Cirillo, 
T. Gilmore, A G. Graham, P. M. Grob, E. R. 
Hickey, N. Moss, S. Pav, and J. Regan, Nat. 
Struct. Biol., 9, 268-272 (2002). 

196. J. Regan, S. Breitfelder, P. Cirillo, T. Gilmore, 
A. G. Graham, E. Hickey, B. Klaus, J. Madwed, 
M. Moriak, N. Moss, C. Pargellis, S. Pav, A 
Proto, A. Swinamer, L. Tong, and C. Torcel- 
lini, J. Med. Chem., 45,2994 (2002). 

197. G. R. Boss and J. E. Seegmiller, Annu. Rev. 
Genet., 16, 297-328 (1982). 

198. S. E. Ealick, S. A. Rule, D. C. Carter, T. J. 
Greenhough, Y. S. Babu, W. J. Cook, J. Ha- 
bash, J. R. Helliwell, J. D. Stoeckler, R. E. 
Parks Jr., S. Chen, and C. E. Bugg, J. Biol. 
Chem., 265,1812(1990). 

199. S. E. Ealick, Y. S. Babu, C. E. Bugg, M. D. 
Erion, W. C. Guida, J. A. Montgomery, and 
J. A Secrist 3rd, Proc. Natl. Acad. Sci. USA, 

88,11540-11544(1991). 

200. J. A. Montgomery, S. Niwas, J. D. Rose, J. A 
Secrist 3rd, Y. S. Babu, C. E. Bugg, M. D. 
Erion, W. C. Guida, and S. E. Ealick, J. Med. 
Chem., 36, 55-69 (1993). 

201. M. Duvic, E. A Olsen, G. A Omura, J. C. 
Maize, E. C. Vonderheid, C. A. Elmets, J. L. 
Shupack, M. F. Demierre, T. M. Kuzel, and 
D. Y. Sanders, J. Am. Acad. Dermatol., 44, 

940-947 (2001). 

202. P. E. Morris Jr. and G. A Omura, Curr. ' 
Pharm. Des., 6, 943—959 (2000). 



CHAPTER ELEVEN 


X-Ray Crystallography 
in Drug Discovery 


Douglas A. Lmngston 
Sean G. Buchanan 
KevinL. D’Amico 
Michael V. Milburn 
Thomas S. Peat 
J. Michael Sauder 
Structural GenomiX 
San Diego, California 


Contents 

1 Introduction, 472 

2 Methodology, 472 

2.1 Theory, 472 

2.2 Crystallization, 473 

2.3 Data Collection, 474 

2.4 Phase Problem, 476 

2.5 Computing and Refinement, 478 

2.6 Databases, 478 

3 Applications of the Use of Crystallographic 
Studies in Drug Discovery and Development, 479 

4 Structural Genomics, 481 

4.1 Introduction to Structural Genomics, 481 

4.2 Genome Annotation, 481 

4.3 Pathways, 495 

4.4Protein Structure Modeling, 495 

5 Conclusion, 496 


Burger's Medicinal Chemistry and Drug Discovery 
Sixth Edition, Volume 1: Drug Discovery 
Edited by Donald J. Abraham 
ISBN 0-471-27090-3 © 2003John Wiley & Sons, Ire. 


472 


X-Ray Crystallography in Drug Discovery 


1 INTRODUCTION 

The practice of crystallography is undergoing 
dramatic change because of the advent of new 
robotics technologies, orders-of-magnitude 
improvement in X-ray sources and computa¬ 
tional power, and the advances in protein pro¬ 
duction stemming from the recent revolution 
in molecular biology. This chapter covers 
these changes in the context of an overview of 
the techniques of modern crystallography, 
their application in the identification and 
characterization of targets and mechanisms 
for therapeutic intervention, and the nascent 
field of structural genomics. Structure-based 
drug design applications are covered else¬ 
where. 

The exponential growth in the rate of de¬ 
termination of new protein structures contin¬ 
ues unabated. Technologies developed in the 
late 1980s (1) have now evolved to the point 
that they have been implemented in high- 
throughput (HTS) format, driving the rate 
even higher. Super-intense, precise, tunable 
X-rays are now available from undulator 
beamlines. Three "third-generation" synchro¬ 
trons, designed and built for this purpose, are 
now on line—ESRF in Grenoble, France; 
Spring-8 in Japan; APS at Argonne National 
Laboratory in the United States — and others 
are under construction. In a relative sense, 
this capability has had minimal impact on me¬ 
dicinal chemistry to date, but that will cer¬ 
tainly change. The companies that have suc¬ 
cessfully built high-throughput protein 
crystallography systems (SGX and Syrrx in 
the United States and Astex in the U.K. 
among others) have all now turned their pro¬ 
digious capacity to the co-crystallization of 
small molecules with target proteins for the 
purpose of drug discovery. The capacity to 
compare, in parallel, the binding modes of a 
set of hits from HTS, or a given lead series, will 
be valuable, but an even greater impact will 
result from the decrease in turnaround time 
required to generate co-crystal structures. 
This has been the most significant hindrance 
to realizing the full potential of structure- 
based drug design. A structure is far more use¬ 


ful before the chemist has embarked on the 
synthesis of the next series, rather than after. 

Another important development toward 
new target identification is the effort in large- 
scale structural annotation of various ge¬ 
nomes, the field of structural genomics. In 
classifying proteins by function as a step to¬ 
ward validating them as therapeutic targets, 
structural homology is perhaps the most im¬ 
portant tool available. These efforts have been 
taken up by a number of publicly-funded con¬ 
sortia (2), because the commercial value cf 
genomic databases in general has not been 
high enough to justify their cost in the private 
sector. Given that medicinal chemists think 
and communicate largely in structural terms, 
this recent growth in the influence of struc¬ 
tural biology is very important. It forms the 
basis of a powerful link between chemistry 
and biology, and we have only begun to realize 
its potential. 

2 METHODOLOGY 
2.1 Theory 

X-ray crystallography provides atomic or near 
atomic resolution of matter. The periodicity of 
crystals, reflecting the repeating units of mo¬ 
lecular structure, diffracts X-rays accordingto 
Bragg’s law: nk = 2dsin0, where n is the order 
of diffraction, A the wavelength of the radia¬ 
tion, d the spacing or distance between a fam¬ 
ily of lattice planes in the crystal, and e the 
angle of the diffraction. X-radiation is ideal to 
analyze atomic structure, because the wave¬ 
lengths used are in the order of 0.1-2.0 A with 
0.75 A being about one-half the distance of an 
aliphatic carbon-carbon bond. 

The images of diffracted crystal lattices can 
be observed with specialized precession photo¬ 
graphic equipment, although the modern day 
image plate detectors used in most laborato¬ 
ries produce a diffraction image that can be 
analyzed by computer to provide the indices of 
the lattice diffraction spots (Fig. ll.l,a-c). 

The X-ray diffraction from the electron 
clouds surrounding each nucleus is either re¬ 
inforced or impeded and gives rise to the dif- 
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Figure 11.1. (a) A look at a two-dimensional crystal lattice diffraction pattern for a small molecule 
natural product. MW 222. Each diffraction intensity in the lattice is numbered to give a unique three 
dimensional address (identification)for that measurement. These numerical addresses are referred to as 
Miller indices or hkl values, (b) A diffraction pattern from a precession photograph for hemoglobin, MW 
65,000. Note the the diffraction lattice spacings are much smaller for the large molecule and reflects the 
mature of Bragg’s law, where the lattice is observed in reciprocal space (lid = 2sin0/raA). (c) An image plate 
diffraction pattern for a protein. [Adapted with permission from D. J. Abraham, Computer-Aided Drug 
^Design, Methods, and Applications, Marcel Dekker, Inc., New York, 1989.1 


ference in intensities observed in Fig. 11.1. 
The steps that one goes through to solve a 
crystal structure follow, with the intent of pro¬ 
viding the non-crystallographer with a simpli¬ 
fied and pictorial view of the process. 

2.2 Crystallization 

Crystallization is the critical first and most 
important step, because good single crystals 
usually provide quality diffraction. Linus 
Paulling once entitled one of his lectures "The 
Importance of Being Crystalline" (3). Unfor¬ 


tunately, crystallization is still more empirical 
than scientific. It requires closely monitored 
matrix changes in growing conditions, i.e., pH, 
salt concentration, temperature, solvents, and 
crystallization setups. Most laboratories now 
use well-known sparse matrix screens pio¬ 
neered by Jancarik and Kim (4) and further 
refined and commercially distributed by 
Hampton Research (5, 6). Screens will typi¬ 
cally employ vapor diffusion experiments 
(hangingdrops or sitting drops), and occasion¬ 
ally batch and liquid-liquid diffusion methods. 
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More recently, batch crystallizations have 
been rejuvenated by the development of mi- 
crobatch robots and by the groups of Chayen 
(7), DeTitta (8), and D’Arcy (9). 

Although discovering the crystallization 
conditions for a new protein or nucleic acid 
can be tedious, relatively inexperienced indi¬ 
viduals can usually succeed at growing crys¬ 
tals once the initial conditions are established. 
Some of the most successful crystallization 
methodologies are based on vapor diffusion 
methods (Fig. 11.2). The general idea behind 
vapor diffusion crystallization is to dissolve 
the protein in a buffer, with a non-precipitat¬ 
ing amount of the miscible vapor solvent, in a 
reservoir that is in equilibrium with higher 
concentration of the vaporizing solvent 
nearby. Another variation is to set up the crys¬ 
tallization cocktail containing salts, buffers, P 
(poly) ethylene glycols (PEGS), small mole¬ 
cule solvents, etc., where volume is slowly re¬ 
duced by the equilibrating mixture, which is 
placed nearby. McPherson, Carter, and others 
have developed more quantitative methods for 
optimizing crystal growth (10). 

2.3 Data Collection 

Most laboratories have rotating anode sources 
for production of high intensity X-ray beams. 
These are coupled with an area detector that 
has made single crystal diffractometers obso¬ 
lete. Mirrors and other technology have also 
been used to provide a more intense and 
monochromatic radiation source (ll).Radia- 
tion from rotating anode sources is at a fixed 
wavelength, usually from high-voltage elec¬ 
trons impinging on either a copper or molyb¬ 
denum rotating anode, i.e., radiation at 1.54 A 
(copper) or 0.71 A (molybdenum). Radiation 
from synchrotron sources can often be tuned 
to a wavelength of interest for multiwave¬ 
length anomalous diffraction (MAD) experi¬ 
ments (see below). 

X-rays generated by a synchrotron source 
are typically two orders of magnitude stronger 
than conventional CuKa radiation generated 
by a rotating anode. Synchrotron sources have 
greatly extended the ability to solve new pro¬ 
tein structures when only weakly diffracting 
or small crystals are available. Another advan¬ 
tage in using the stronger synchrotron radia¬ 
tion is that the crystal exposure time is signif¬ 


icantly lower. The typical exposure time for 
home laboratory CuKa sources ranges from 5 
to 60 min for a range of data, whereas the equiv¬ 
alent set of data at an undulator beamline, i.e., 
the advanced photon source (APS), requires 
only about 1 s of exposure time. Synchrotron 
radiation has also allowed the use of MAD, en¬ 
abling phasing (imaging) of the protein using a 
derivative with only one heavy element. 

A variety of detectors are in common use to 
record X-ray data and have the advantage of 
measuring the intensities of large numbers of 
diffraction spots simultaneously. The most 
popular detectors are image plates and charge- 
coupled device (CCD) cameras. Image plates 
are typically the choice for laboratory rotating 
anode sources and lower flux synchrotron 
sources (Fig. 11.3). CCDs have the distinct ad¬ 
vantage of speed at the higher flux synchro¬ 
tron sources, because they simultaneously 
measure and record diffraction intensities 
(amplitudes). Current CCD cameras have 
readout times on the order of a few (typically 
2-8) seconds, a speed not dreamed of when the 
first protein structure data was recorded from 
photographs (with intensities measured by 
eye comparison to standard reference spots on 
a separate film strip). Speed of data collection 
can be an important advantage at third gener¬ 
ation synchrotron sources, with even shorter 
exposure times. On the other hand, image 
plates have a greater range of use, being acces¬ 
sible in any X-ray diffraction laboratory, with 
many of the newer models taking less than 1 
min to record the intensity data. Image plate 
detectors often have more than one image plate, 
so one can be read while the other is exposed, 
effectively wasting no time during the collection 
period. The image plates also offer a larger sur¬ 
face area for data collection than most CCD cam¬ 
eras and are considerably less expensive. 

X-ray diffraction data from crystals are ei¬ 
ther collected at room temperature or under 
cryogenic conditions at liquid nitrogen tem¬ 
peratures [around 100°K(-170°C)]. For room 
temperature data collection, crystals are nor¬ 
mally mounted in thin-walled glass capillar¬ 
ies, with a small amount of mother liquor 
about 5 mm from the crystal. The mother li¬ 
quor in the capillary is critical because protein 
crystals are 40-80% water—dried protein 
crystals do not diffract. The nearby mother 
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Drop: 50% 
protein, 

50% cocktail 

Figure 11.2. (a) The drops are typically 1-10 pA total volume, with between 100 and 1000 pA total 
volume of cocktail in the well. The smaller the drop size, the faster the equilibrium occurs, in general. 
There are a variety of plates now available in which to set up these vapor diffusion experiments, the 
most common being 24-well Limbro plates and 96-well microtiter plates. Several robots have been 
developed to automatically set up the crystallization experiments; although most are no faster than 
doing the same procedure by hand (particularly with a multi-channel pipettor), there can be other 
advantages (e.g., consistency and reducing repetitive stress syndrome). Once plates are set up, they 
arc? typically kept at a constant temperature and observed periodically under a microscope, (b and c) 
Progress in automating this aspect of characterization has occurred, and there are now imagers that 
will take high resolution, digital pictures of each drop in turn and store these for either manual or 
automated analysis, (d) Batch experiments are set up such that the protein is mixed with cocktail and 
there is little concentration or dilution to the sample over time. This can now be done in very high 
throughput and small scale: 50-200 nL drops under oil in 1536-well plates, for example. This kind of 
approach has been used to screen hundreds of conditions with small amounts of protein, which may 
allow for faster optimization later. One caveat is that small crystals don't necessarily lead to larger 
crystals later, and all structures to date have had crystals of greater than 10 microns in at least one 
dimension. 
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Figure 11.3. (a) Area detector showing the config¬ 
uration of the unit, (b) Area detector showing the 
face. 

liquor ensures that the crystal is bathed in the 
vapor of the mother liquor and prevents dry¬ 
ing during data collection. The majority of 
present day data collections in home laborato¬ 
ries and at synchrotrons are done under cryo¬ 
genic conditions, which allows high intensity 
X-radiation to be used without the crystal de¬ 
cay observed in room temperature data collec¬ 
tion. For cryogenic data collection, crystals are 
normally mounted in a thin fiber loop with a 
layer of suitable cryoprotectant solution (Fig. 


Figure 11.4. The crystals are manipulated by 
scooping them up with a small loop cf nylon that is 
glued to the end of a pin. Surface tension from the 
liquid will hold the crystal in the loop, but the crys¬ 
tal can also be held by using a loop that is smaller in 
size than the crystal of interest. This technique will 
work particularly well with fragile crystals, thin 
plates for example, that would normally fall apart in 
a capillary mount. Once the crystal is frozen, it is 
placed on an axis in line of both an X-ray source and 
a stream of nitrogen set to about 100,000 to keep the 
crystal frozen. The crystal is rotated in increments 
during the data collection procedure to collect a full 
data set (typically one or two degrees per frame, 
depending on the resolution limits, mosaicity of the 
crystal, unit cell lengths, etc.). 

11.4). The cryoprotectant forms alayer of non¬ 
crystalline glass around the crystal to protect 
it from freeze shock. Simple freezing of the 
crystal results in the formation of ice in the 
interior of the crystal and renders it useless. A 
quick perusal of the literature shows PEG, 
glycerol, sucrose, and 2-methyl-2,4-pentane 
diol (MPD) as the most popular cryopro- 
tectants. Oils, such as paraffin oil, have also 
been used successfully as cryoprotectants (12). 

2.4 Phase Problem 

X-ray diffraction measurements as described 
above only provide the amplitudes of the dif¬ 
fracted waves. One must have the phase an¬ 
gles of all measured waves relative to a com¬ 
mon origin in order to image the molecule 
using a Fourier analysis. Figure 11.5 illus- 
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(a) Amplitude 



Figure 11.5. Graphs showing the phase relation¬ 
ship of electromagnetic radiation. [Adapted with 
permission from D. J. Abraham, Computer-Aided 
Drug Design, Methods, and Applications, Marcel 
Dekker, Inc., New York, 1989.1 

trates the differences in the phases and ampli¬ 
tudes for two reflections. The solution of the 
phase problem that permitted the first image 
reconstruction of a protein was discovered by 
Perutz using multiple isomorphous replace¬ 
ment (MIR) (13). 

The majority of the earliest structures were 
solved using MIR to phase the maps. This re¬ 
quires soaking the crystals or co-crystallizing 
the protein with two or more heavy atoms and 
hoping that these heavy atoms bind in a spe¬ 
cific way to the protein. It also requires that 
the subsequent crystals are isomorphous with 
the native protein (i.e., no changes in the unit 
cell or symmetry of the crystal). Although it is 
possible to obtain phase information from a sin¬ 
gle heavy atom derivativeusing additional infor¬ 
mation (e.g., anomalous scattering or density 
modification),one often works diligently to get a 
second or third derivative to improve the quality 
of the electron density (Fourier) map. 

Two other common methods are used to 
estimate phases in protein crystallography: 
molecular replacement (MR), which uses the 


structural motif of a homologous protein (14), 
and MAD from a single heavy element (1). 

MR methodology requires a structural 
model that is structurally homologous to the 
protein that has been crystallized. Phasing 
is accomplished through a six-dimensional 
search—a three-dimensional rotation search 
followed by a three-dimensional translation 
search, using the model against the crystal 
data. Molecular replacement is being em¬ 
ployed more frequently as the number of 
known structures has increased, which has 
made unique structural motifs available for 
phasing. For highly homologous protein struc¬ 
tures, this method is usually straightforward 
and successful. For marginal cases, the addi¬ 
tion of some independent phase information, 
single isomorphous replacement (SIR) or 
MIR, i n combination with MR can enhance the 
quality of the Fourier map. 

MAD phasing is an alternate methodology 
for solving the phase problem. MAD requires a 
single heavy atom with anomalous peak scat¬ 
tering at a wavelength where X-rays both at 
and near the spectral energy are accessible. 
Data sets are collected at different wave¬ 
lengths to optimize the anomalous and disper¬ 
sive signals from, the heavy atom. Certain 
beamlines have been designed with wave¬ 
lengths that are tunable "on the fly," and 
these are often referred to as MAD beamlines. 
MAD has become the method of choice for 
rapid structure solution when synchrotron ra¬ 
diation is available. The advantage of MAD 
phasing is that one often only needs a single 
crystal to collect all of the data necessary to 
solve the structure. Although multiple wave¬ 
lengths are collected (anywhere from two to 
five sets), data collection is routinely com¬ 
pleted in less than a few hours. The peak 
wavelength choice data set is very important 
to collect first as it contains the greatest anom¬ 
alous signal and is often used alone to find the 
heavy atom sites (Shake’n Bake program, 
anomalous Patterson maps, etc.). If the crystal 
degrades quickly in the beam, one can also 
employ single-wavelength anomalous diffrac¬ 
tion phasing (SAD) if a full data set at the peak 
wavelength was successfully collected. SAD 
phasing requires additional information, ob¬ 
tained by density modification, to obtain inter¬ 
pretable electron density maps, but has been 
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proven in many instances to result in very 
high quality maps (15). 

Many different heavy atoms have been 
used for MAD/SAD phasing, the most popular 
being selenium. Selenium is incorporated into 
the amino acid sequence of the protein by add¬ 
ing selenomethionine to the growth media 
when the protein is produced (16). For pro¬ 
teins that bind DNA, 5-bromouracil has been a 
popular choice for phasing through anomalous 
scattering. Most heavy elements have good 
anomalous signals (Hg, Pt, U, Au, etc.). Lan¬ 
thanides have a particularly good signal and 
can sometimes substitute for divalent metals 
found naturally in the protein (e.g., Ca) (17). 
One of the major advantages of MAD phasing 
is that the signal does not decay at higher res¬ 
olution with perfectly isomorphic crystals, so 
the experimentally phased map can be quite 
good out to the full resolution of diffraction. 
This typically has not been the case when us¬ 
ing multiple isomorphous replacement, where 
the experimentally phased map often only ex¬ 
tends to around 2.5 A resolution, because of a 
lack of isomorphism between the native and 
heavy metal substituted crystals. Anomalous 
scattering has been useful in the structure de¬ 
termination of very large structures; the 30S 
ribosome was recently solved using Os and Lu 
derivatives (18). 

2.5 Computing and Refinement 

Raw intensity X-ray crystallographic data is 
next reduced and scaled to provide structure 
factors (F) that are used to solve and image the 
structure. Two of the most popular software 
packages employed to reduce raw date are 
Mosflm/CCP4 (19) and the HKL suite (20). 
Both work very well and are very fast with 
modern computers. A variety of programs, 
such as SOLVE (21), Shake and Bake (22), or 
SHELX (23), can be employed to find the 
heavy atom positions, including hand search¬ 
ing methods through Patterson maps. Once 
heavy atom sites are found, they are usually 
refined with the programs SHARP (24) or 
MLPHARE (19). The heavy atom positions 
are next used as phase information input to 
provide initial phases for electron density 
maps, which are used to fit the remainder of 
the protein or nucleic acid. Once a model of the 
structure is obtained it is refined. In cases 
where high resolution data is available, pro¬ 


grams such as wARP (24) can automatically 
provide models of protein structures. When 
high resolution data is not available, a model is 
most often built in by hand using such graph¬ 
ics programs like O (25) or XFIT. The models 
are refined against the data by programs such 
as REFMAC (19) and CNS (26). Ah of these 
programs have become much faster and easier 
to use because of the incredible increases in 
speed that new hardware has allowed. 

It is worth mentioning that statistical and 
probabilistic techniques have had a significant 
impact in how heavy atoms are found and 
models are refined (e.g., SHARP, SOLVE, 
REFMAC). Baysian statistics and maximum 
likelihood methods are now used instead of 
least-squares methods. One may want to con¬ 
sider how various data collection strategies 
may affect the later steps in the process by 
keeping this in mind, i.e., high redundancy in 
the data makes for better statistics. 

The quality of a structure is measured in 
many ways: how low the R factor or R frec is 
(the fit of observed data to the model), the res¬ 
olution limit of the data, the ideality of the 
bonds and angles, etc. How well a structure 
measures up to other structures of about the 
same resolution also gives a good idea of how 
"good" a given structure is (PROCHECKpro- 
gram). SFCHECKis a useful program for as¬ 
sessing the agreement between the atomic 
model and the experimental X-ray data. The 
level of confidence one expects from a given 
model will depend on the resolution of the 
data. This can be seen clearly in Fig. 11.6, 
where a residue from a protein structure is 
shown with three different data cutoffs at dif¬ 
ferent resolution ranges. The model from a 
3.0-A data set may look the same as one from a 
1.3-A data set, but the level of confidence is 
much higher in the latter. A reasonably well- 
refined structure will have a crystallographic 
R factor between 15% and 25% and will have 
an Rf ree of less than 30% under most circum¬ 
stances. 

2.6 Databases 

The Protein Data Bank (PDB) (27, 28) is now 
coordinated by a consortium of several insti¬ 
tutions (Rutgers University, the San Diego 
Supercomputer Center, and National Insti¬ 
tute for Standards and Technology). As of this 
writing, the PDB has over 18,000 structures, 



3 Applications of the Use of Crystallographic Studies in Drug Discovery and Development 


479 




Figure 11.6. Three density maps at differing resolutions: a, 1.3-4; b, 2. 1 A; c, 3.0 A. See color insert. 


with aiver 15,000 of these done by X-ray crys¬ 
tallography. Most of the rest were done by 
NMR. For small molecules, the Cambridge 
Structural Database (CSD) (29) contains 
structural information for over 230,000 or¬ 
ganic and organometallic compounds. All of 
these structures have been determined by X- 
ray or neutron diffraction techniques. 

3 APPLICATIONS OF THE USE OF 
CRYST'ALLOCRAPHIC STUDIES IN DRUG 
DISCOVERY AND DEVELOPMENT 

Crystaillization of small molecule compounds 
with a protein or nucleic acid target followed 
by X-ray crystallographic determination of the 
combhned structure is the basis and hallmark 


of structure-based drug design. As structural 
biology moves into the post-genomic age, 
many companies and academic laboratories 
are faced with the challenge of co-crystalliza¬ 
tion of targets and inhibitors or activators on a 
scale never before attempted. Previously, 
crystal structure determination of a protein- 
substrate or inhibitor complex in an academic 
or industrial environment often yielded the 
structural information desired to understand 
the mechanism of action or in the design of a 
more suitable substrate or inhibitor. However 
modern day laboratories are now faced with 
the daunting challenge of crystallizing hun¬ 
dreds of compounds for clues in further ligand 
design using standard organic synthesis or 
combinatorial approaches. 
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A variety of methods have been employed 
to co-crystallize biological molecules with 
small molecules. Discovery of crystallization 
conditions is still an often tedious task, so 
newer methods for screening crystallization 
conditions for proteins include the use of semi- 
automated robots. 

Two fundamental methods are available 
for co-crystallizations. One method is termed 
"soaking." This employs the addition of the 
small molecule directly to saturated solutions 
containing crystals of the biological macro¬ 
molecule in hopes that the ligand of interest 
will soak directly into the crystal and bind to 
its active or binding site, so that the co-crystal 
structure can be determined. The other 
method, called co-crystallization, depends on 
having an ability to add the ligand to the aque¬ 
ous protein solution in at least stoichiometric 
amounts, followed by crystallization using ei¬ 
ther the known crystallization conditions or 
by setting up a new screen for determining 
suitable crystallization conditions. Both 
methods have disadvantages and advantages, 
and it is primarily up to the investigator to 
decide which method, or both methods, should 
be employed for their experiments. 

One limitation to the soaking method is 
that the amount actually dissolved and avail¬ 
able to form the complex can often not be eas¬ 
ily determined or controlled. In general, an 
excess of ligand, as a solid, is added to the so¬ 
lution with the crystals of protein with the 
hope that the ligand dissolves completely and 
will diffuse into the crystal binding site. One 
method that has demonstrated success in¬ 
volves equilibration of the crystal with slightly 
higher concentration of the crystal mother li¬ 
quor that contains the ligand solubilized by 
organic cosolvents (i.e., isopropanol, DMSO, 
ethanol, etc.) as part of the medium for diffu¬ 
sion. However, higher levels of organic solvent 
often decreases the resolution of diffraction. 
Lowering the level of solvent after the addi¬ 
tion of compound has been found to result in 
better diffracting crystals. Another major lim¬ 
itation of this method is that it is necessary to 
collect an entire X-ray diffraction data set to 
determine if the small molecule is bound to 
the protein. This trial and error method can 
be time-consuming or expensive if a high- 
throughput crystallography approach is the 
objective. 


Co-crystallization permits highly parallel 
screening for bound ligands through robotic 
systems. The co-crystallization method is bet¬ 
ter suited for high-throughput crystallogra¬ 
phy, because ligand binding can sometimes be 
determined without the need to solve each 
structure. Faster spectral analyses, or alterna¬ 
tives such as native gel shifts, gel filtration, 
and mass spectrometry can provide informa¬ 
tion on which of the crystals should be taken 
into X-ray studies. One difficulty in using the 
co-crystallization method is the problem of de¬ 
termining the concentration of the protein 
that is most suitable for complexation. For ex¬ 
ample, if the protein solution is roughly 1 m M 
in high salt or aqueous buffers, many organic 
molecules are not as soluble at that level. In 
these cases a lower concentration of protein is 
usually employed to attain stoichiometric ra¬ 
tios. As described above, small percentages of 
organic solvent can be useful for increasing 
the concentration of the organic compound in 
solution, but not without affecting the protein 
stability or crystal quality. In general, lower¬ 
ing the protein concentration sufficiently, fol¬ 
lowed by addition of the appropriate amount 
of ligand, and then concentration of the mix¬ 
ture to the desired protein concentration for 
crystallization is the most successful method. 

Once conditions for obtaining the com- 
plexed protein have been obtained, the next 
step is to decide on which crystallization con¬ 
ditions to use. In some cases, those proteins 
that do not undergo large tertiary structure 
changes when complexed to ligands can be 
crystallized under similar conditions as for the 
uncomplexed ligand. However, in some in¬ 
stances, proteins will change conformation, 
depending on the type of ligand that they are 
complexed with, and a large screening of pos¬ 
sible new crystallization conditions is re¬ 
quired. 

In many cases, soaking a compound into a 
crystal is not possible because of low solubility 
of the compound in the aqueous mother li¬ 
quor. Soaking experiments can also be limited 
when the conformational space of the binding 
site is hindered, occupied by adjacent mole¬ 
cules in the crystal lattice, or if there are con¬ 
formational changes in the binding site be¬ 
cause of crystal packing effects. On the other 
hand, co-crystallization of the protein and li- 
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gand, by the nature of the process, usually re¬ 
quires more resources in terms of protein and 
experimental time, leading to greater expense. 

Soaking crystals and/or growing crystals in 
the presence of inhibitors or ligands provides 
an opportunity to directly observe their bind¬ 
ing interactions, along with the often subtle 
conformational effects that can have a pro¬ 
found effect on the mode of binding. When it is 
possible to use this in an iterative fashion to 
guide the design of the next set of compounds 
to be synthesized in a lead series, this becomes 
a potent tool. Understanding how and why a 
compound or series binds to an active site, par¬ 
ticularly when the affinity is also known, pro¬ 
vides the best understanding, and the highest 
level at which it is possible to enable drug de¬ 
sign. As structural biology becomes a more in¬ 
tegrated component of drug discovery, better 
methods of obtaining crystal structures with 
and without bound ligands will be developed, 
with lower costs and faster turnaround times. 
Many companies and academic laboratories 
are now focused on solving these challenges. 
But it is also impressive to see how well we 
have progressed—Table 11.1 enumerates the 
structures of known therapeutic targets (with 
or without ligands) that are available in the 
public domain. This table is based on the Na¬ 
ture Biotechnology "The Usual Suspects" 
poster, published in 2001, but is almost iden¬ 
tical in content to the 1997 version (30). Ref¬ 
erence sequences for 300 of the targets were 
extracted from NCBI Genbank and the non- 
redundant database was searched with several 
iterations of PSI-BLAST. The resulting profile 
was used to search the PDB + SGX database 
of known structures. The top hits for each 
drug target are tabulated below. A great many 
more reside within pharmaceutical and bio¬ 
tech companies as proprietary structures. 

4 STRUCTURAL GENOMICS 

4.1 Introduction to Structural Cenomics 

Until recently, structure determination by 
protein crystallography was a time-consuming 
method accessible to a few privileged skilled 
practitioners. X-ray crystallography was re¬ 
served to tackle questions requiring atomic 
resolution details of a demonstrably impor¬ 
tant protein, often a drug target. Indeed, to 


this day, crystallography is almost exclusively 
used in the pharmaceutical industry to study 
small molecule interactions with drug targets 
(see Section 3). 

The development of several new methods 
(describedin Section 2) and the availability of 
the complete genome sequences of both patho¬ 
gens and hosts provides an unparalleled op¬ 
portunity to exploit protein structures for 
drug discovery research in new ways. We can 
now contemplate using protein structure de¬ 
termination to help annotate genomes, that is, 
to assess new drug targets as well as provide 
multiple high-resolution structures that ad¬ 
dress selectivity issues. This emerging science 
of high-throughput structural biology has 
been termed structural genomics. 

4.2 Genome Annotation 

It is in infectious disease that whole genome 
information first became available, and it is in 
this field that structural genomics is having an 
initial impact (373). A typical approach has 
been to assess the viability and/or virulence of 
pathogens by systematic disruption of every 
predicted gene product. As a consequence, a 
large number of potential new targets have 
emerged: genes that are essential for pathoge¬ 
nicity of bacteria in a model system. Often 
these new genes have been filtered for those 
that are conserved in a variety of pathogens 
and that do not have a close human homolog 
(374). About 30-50% of the genes of a typical 
pathogen have no reliable functional assign¬ 
ment. A similar fraction of the novel targets 
shown to be essential also fall into this cate¬ 
gory, which becomes problematic for assay 
configuration. Indeed, in target-based ap¬ 
proaches, the number of leads emerging has 
been disappointing. Protein structure can pro¬ 
vide the information required to prioritize 
among these essential genes and to establish 
assays. Co-complex structures with even low- 
affinity hits can be used to provide key infor¬ 
mation for medicinal chemistry. 

There are several ways in which structural 
genomics has promise as a tool for genome an¬ 
notation and target prioritization. For genes 
of unknown function, structure can often pro¬ 
vide clues to biochemical function. Sequence 
homology has become a routine method for 
functional assignment, but even the most 
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Table 11.1 Kn own Drug Targets with Published Structures 


Target and PDB Reference 

Resolution 

Source 

Homology 

Year 

Reference 

Acetylcholinesterase 

1MAH(A) 

3.20 A 

Green mamba 

88% 

1995 

(31) 

1B41(A), 1F8U(A) 

2.76 A, 2.90 A 

Green mamba 

99% 

2000 

(32) 

1C2B(A), 1C20(A) 

4.50 A, 4.20 A 

Electric eel 

88% 

1999 

(33) 

1MAA(A) 

2.90 A 

Mouse 

88% 

1998 

(34) 

Adenosine deaminase 

1FKX, 1FKW 

2.40 A 

Mouse 

82% 

1996 

(35) 

1A4L(A), 1A4M(A) 

2.60 A, 1.95 A 

Mouse 

83% 

1998 

(36) 

1UIO, 1UIP 

2.40 A 

Mouse 

82% 

1996 

(37) 

1ADD 

2.40 A 

— 

83% 

1992 

(38) 

2ADA 

2.40 A 

— 

83% 

1994 

(39) 

Alpha-amylase 

1JXJ(A), 1JXK(A) 

1.90 A 

Human 

99% 

2001 

(40) 

1SMD 

1.60 A 

Human 

99% 

1996 

(41) 

1C8Q(A) 

2.30 A 

Human 

99% 

2000 

(42) 

1CPU(A), 2CPU(A) 

2.00 A 

Human 

97% 

1999 

(43) 

1BSI 

2.00 A 

Human 

97% 

1998 

(44) 

1HNY 

1.80 A 

Human 

97% 

1995 

(45) 

3CPU(A) 

2.00 A 

Human 

96% 

1999 

(46) 

1B2Y(A) 

3.20 A 

Human 

97% 

1998 

(47) 

1DHK(A) 

1.85 A 

Kidney bean 

86% 

1996 

(48) 

1JFH 

2.03 A 

Pig 

86% 

1997 

(49) 

1PIF, 1PIG 

2.30 A, 2.20 A 

Pig 

85% 

1996 

(50) 

10SE 

2.30 A 

Pig 

86% 

1996 

(51) 

1HX0(A) 

1.38 A 

Pig 

86% 

2001 

(52) 

1BVN(P) 

2.50 A 

S. tendae 

85% 

1998 

(53) 

1PPI 

2.20 A 

— 

86% 

1994 

(54) 

Androgen receptor 

1E3G(A) 

2.40 A 

Human 

100% 

2000 

(55) 

1137(A), 1138(A) 

2.00 A 

Rat 

99% 

2001 

(56) 

Anticoagulant protein C 

1AUT(C) 

2.80 A 

Human 

100% 

1996 

(57) 

Aquaporin 1 

1IH5(A) 

3.70 A - 

Human 

100% 

2001 

(58) 

1FQY(A) 

3.80 A 

Human 

100% 

2000 

(59) 



j8-Amyloid 


1MWP(A) 

1.80 A 

Human 

100% 

1999 

(60) 

/3-Lactamase[Sa] 

1BTL 

1.80 A 

Bacteria 

100% 

1993 

(61) 

1FQG(A) 

1.70 A 

Bacteria 

98% 

2000 

(62) 

1JTD(A) 

2.30 A 

Bacteria 

99% 

2001 

(63) 

1HTZ(A) 

2.40 A 

Bacteria 

98% 

2001 

(64) 

1ERM(A), lERO(A) 

1.70 A, 2.10 A 

Bacteria 

99% 

2000 

(65) 

1ERQ(A) 

1.90 A 

Bacteria 

99% 

2000 

(65) 

1XPB 

1.90 A 

Bacteria 

99% 

1997 

(66) 

1ESU(A) 

2.00 A 

Bacteria 

98% 

2000 

(67) 

1BT5(A) 

1.80 A 

Escherichia coli 

100% 

1998 

(68) 

ITEM 

1.95 A 

Escherichia coli 

100% 

1996 

(69) 

1CK3(A) 

2.28 A 

Escherichia coli 

99% 

1999 

(70) 

1AXB 

2.00 A 

Escherichia coli 

100% 

1997 

(71) 

/3-Tubulin 

1JEF(B) 

3.50 A 

Bovine 

99% 

2001 

(72) 

1TUB(B) 

3.70 A 

Pig 

99% 

1997 

(73) 

1FFX(B) 

3.95 A 

Rat 

99% 

2000 

(74) 

Calcineurin A 

1TC0(A) 

2.50 A 

Bovine 

99% 

1996 

(75) 

1AUI(A) 

2.10 A 

Human 

99% 

1997 

(76) 

Carbonic anhydrase 2 

1HEA, 4CAC, 5CAC 

2.00 A, 2.20 A 

Human, HSV-1 

100% 

1992 

(30) 

1G6V(A) 

3.50 A 

Arabian camel 

100% 

2000 

(77) 

1CNW, 1CNX, 1CNY 

2.00 A, 1.90 A, 2.30 A 

Human 

100% 

1995 

(78) 

1IF4(A), 1IF5(A), 1IF6(A) 

1.93 A, 2.00 A, 2.09 A 

Human 

100% 

2001 

(79) 

1IF9(A) 

2.00 A 

Human 

100% 

2001 

(80) 

1CA3,1HEB, 1HED 

2.30 A, 2.00 A 

Human 

100% 

1992 

(81) 

1DCA, 1DCB 

2.20 A, 2.10 A 

Human 

99% 

1993 

(82) 

1CRA 

1.90 A 

Human 

100% 

1992 

(83) 

1CIL, 1CIM, 1CIN 

1.60 A, 2.10 A 

Human, HSV-1 

100% 

1993 

(84) 

1CAY 

2.10 A 

Human 

100% 

1993 

(85) 

1RZA, 1RZB, 1RZC, 1RZD, 1RZE 

1.80 A-1.90 A 

Human 

100% 

1993 

(86) 

2CA2 

1.90 A 

Human 

100% 

1989 

(87) 

1BN1, 1BN3, 1BN4, 1BNM 

2.10 A, 2.20 A, 2.60 A 

Human 

100% 

1998 

(88) 

1C AH 

1.88 A 

Human 

100% 

1992 

(89) 



Table 11.1 (Continued) 


Target and PDB Reference 

Resolution 

Source 

Homology 

Year 

Reference 

1I8Z(A) 

1.93 A 

Human 

100% 

2001 

(90) 

1BV3CA) 

1.85 A 

Human 

100% 

1998 

(91) 

12CA 

2.40 A 

Human 

99% 

1991 

(92) 

1G53(A) 

1.94 A 

Human 

100% 

2000 

(93) 

1AM6 

2.10 A 

Human 

100% 

1997 

(94) 

1CAN, ICAO 

1.90 A 

Human rhinovirus 

100% 

1992 

(95) 

1G0E(A), 1G0F(A) 

1.60 A 

Human 

99% 

2000 

(96) 

1AVN 

2.00 A 

Human 

100% 

1997 

(97) 

1UGF 

2.00 A 

Human 

99% 

1996 

(98) 

1HVA 

2.30 A 

Human 

99% 

1992 

(99) 

5CA2 

2.10 A 

Human 

99% 

1991 

(100) 

1HCA 

2.30 A 

— 

100% 

1992 

(101) 

4CA2, 6CA2, 7CA2, 9CA2 

2.10 A-2.80A 

Human 

100% 

1991 

(102) 

1ZNC(A) 

2.80 A 

Human 

100% 

1996 

(103) 

Catechol methyltransferase 

1VID 

2.00 A 

Rat 

80% 

1996 

(104) 

Cholecystokinina receptor 

1D6G(A) 

NMR 

— 

95% 

1999 

(105) 

Coagulation factor 10 

1EZQ(A), lFOR(A), 1F0S(A) 

2.20 A, 2.10 A 

Human 

100% 

2000 

(106) 

1C5M(D) 

1.95 A 

Human 

99% 

1999 

(107) 

1XKA(C), 1XKB(C) 

2.30 A, 2.40 A 

Human 

100% 

1998 

(108) 

1FAX(A) 

3.00 A 

Human 

98% 

1996 

(109) 

1FJS(A) 

1.92 A 

Human 

100% 

2000 

(110) 

1KIG(H) 

3.00 A 

Soft tick 

83% 

1997 

(111) 

1HCGCA) 

2.20 A 

— 

100% 

1993 

(112) 

Coagulation factor 2 

1AI8(H) 

1.85 A 

Hirudo rnedicinalis 

100% 

1997 

(30) 

1MKW(K), 1MKX(K) 

2.30 A, 2.20 A 

Bos taurus 

84% 

1997 

(113) 

1BTH(H) 

2.30 A 

Bovine 

99% 

1996 

(114) 

1HXF(H) 

2.10 A 

Hirudo rnedicinalis 

100% 

1996 

(115) 

1G30(B) 

2.00 A 

Hirudo rnedicinalis 

100% 

2000 

(116) 

1A3E(H) 

1.85 A 

Hirudo rnedicinalis 

100% 

1998 

(117) 

1D3P(B), 1D3Q(B) 

2.10 A, 2.90 A 

Hirudo rnedicinalis 

100% 

1999 

(118) 

1HDT(H) 

2.60 A . 

Hirudo rnedicinalis 

100% 

1994 


1AD8(H) 

2.00 A 

Hirudo rnedicinalis 

100% 

1997 
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1LHC(H), 1LHF(H), 1LHG(H) 

1.95 A, 2.40 A, 2.25 A 

Hirudomedicinalis 

100% 

1994 

(121) 

IJOU(B) 

1.80 A 

Human 

99% 

2001 

(122) 

1DIT(H) 

2.30 A 

Human 

100% 

1995 

(123) 

1UVS(H) 

2.80 A 

Human 

100% 

1996 

(124) 

4THN(H) 

2.50 A 

Human 

100% 

1998 

(125) 

1THP(B) 

2.10 A 

Human 

99% 

1999 

(126) 

1J0U(B) 

1.80 A 

Human 

99% 

2001 

(122) 

1DIT(H) 

2.30 A 

Human 

100% 

1995 

(123) 

1UVS(H) 

2.80 A 

Human 

100% 

1996 

(124) 

4THN(H) 

2.50 A 

Human 

100% 

1998 

(125) 

1THP(B) 

2.10 A 

Human 

99% 

1999 

(126) 

1AY6(H) 

1.80 A 

Human 

100% 

1997 

(127) 

1C1U(H), 1C1V(H) 

1.75 A, 1.98 A 

Human 

100% 

1999 

(128) 

1A4W(H) 

1.80 A 

Human 

100% 

1998 

(129) 

1G37(A) 

2.00 A 

Human 

100% 

2000 

(130) 

1E0J(A), 1E0L(A) 

2.10 A 

Human 

100% 

2000 

(131) 

1BB0(B) 

2.10 A 

Human 

100% 

1998 

(132) 

1C4U(2), 1D6W(A), 1D9I(A), lDOJ(A) 

1.70 A-2.30 A 

Human 

100% 

1999 

(133) 

7KME(H) 

2.10 A 

Human 

100% 

1999 

(134) 

1QBV(H) 

1.80 A 

Human 

100% 

1999 

(135) 

1DM4(B) 

2.50 A 

Human 

99% 

1999 

(136) 

1UMA(H) 

2.00 A 

Medicinal leech 

100% 

1996 

(137) 

1BMM(H), 1BMN(H) 

2.60 A, 2.80 A 

Medicinal leech 

100% 

1995 

(138) 

1A2C(H) 

2.10 A 

M aeruginosa 

100% 

1997 

(139) 

1FPC(H) 

2.30 A 

— 

100% 

1994 

(140) 

lNRO(H), 1NRR(H) 

3.10 A, 2.40 A 

— 

100% 

1994 

(141) 

1HAG(E) 

2.00 A 

— 

100% 

1994 

(142) 

1HLT(H) 

3.00 A 

— 

100% 

1994 

(143) 

1TMU(H) 

2.50 A 

— 

100% 

1994 

(144) 

4HTC(H) 

2.30 A 

— 

100% 

1993 

(145) 

1AIX(H), 1DWB(H), 1DWC(H) 

2.10 A, 3.16 A, 3.00 A 

Hirudo medicinalis 

100% 

1992 

(146) 

2HPP(H) 

3.30 A 

— 

100% 

1993 

(147) 

1ABI(H) 

2.30 A 

— 

100% 

1992 

(148) 

Coagulation factor 7 

1JBU(H) 

2.00 A 

Bacteria 

100% 

2001 

(149) 

Coagulation factor 7a 

1QFK(H) 

2.80 A 

Human 

100% 

1999 

(150) 

1DVA(H) 

3.00 A 

Human 

100% 

2000 

(151) 



486 


Table 11.1 (Continued) 


Target and PDB Reference 

Resolution 

Source 

Homology 

Year 

Reference 

1DAN(H) 

2.00 A 

Human 

100% 

1997 

(152) 

1CVW(H) 

2.28 A 

Human 

100% 

1999 

(153) 

1FAK(H) 

2.10 A 

Human 

100% 

1998 

(154) 

Coagulation factor 9 

1RFNCA) 

2.80 A 

Human 

100% 

1999 

(155) 

1PFX(C) 

3.00 A 

Pig 

88% 

1995 

(156) 

Cox-1 

1DIY(A) 

3.00 A 

Sheep 

93% 

1999 

(157) 

1CQE(A), 1PRH(A) 

3.10 A, 3.50 A 

Sheep 

92% 

1994 

(158) 

1PTH 

3.40 A 

Sheep 

92% 

1995 

(159) 

1EBV(A) 

3.20 A 

Sheep 

93% 

2000 

(160) 

1FE2(A) 

3.00 A 

Sheep 

92% 

2000 

(161) 

1EQG(A), 1EQH(A), 1HT5(A), lHT\ill\(A) 

2.61 A-2.75A 

Sheep 

92% 

2000 

(162) 

1PGE(A), 1PGF(A), 1PGG(A) 

3.50 A, 4.50 A 

Sheep 

92% 

1995 

(163) 

Cox-2 

1CVU(A), 1DDX(A) 

2.40 A, 3.00 A 

Mouse 

87% 

1999 

(164) 

1CX2, 3PGH, 4C0X, 5C0X, 6COX 

3.00 A 

Mouse 

87% 

1996 

(165) 

Cytochrome P450 reductase 

1B1C(A) 

1.93 A 

Human 

100% 

1998 

(166) 

1AM0(A) 

2.60 A 

Rat 

93% 

1997 

(167) 

1J9Z(A), 1JA0(A), UAl(A) 

2.70 A, 2.60 A, 1.80 A 

Rat 

92% 

2001 

(168) 

Dihydrofolate reductase 

lBOZ(A) 

2.10 A 

Human 

99% 

1998 

(169) 

1HFP, 1HFQ, 1HFR 

2.10 A 

Human 

100% 

1997 

(170) 

10HJ, 10HK 

2.50 A 

Human 

100% 

1997 

(171) 

1DR1, 1DR5, 1DR6,1DR7 

2.20 A, 2.40 A 

— 

75% 

1992 

(172) 

1DR2, 1DR3 

2.30 A 

— 

75% 

1992 

(173) 

1DR4 

2.40 A 

— 

75% 

1992 

(174) 

1DHF(A), 2DHF(A) 

2.30 A 

— 

100% 

1989 

(175) 

1DLR, 1DLS 

2.30 A 

— 

99% 

1995 

(176) 

8DFR 

1.70 A 

— 

75% 

1989 

(177) 

1DRF 

2.00 A 

— 

100% 

1990 

(178) 

Dihydroorotate dehydrogenase 

1D3G(A), 1D3H(A) 

1.60 A, 1.80 A 

Human 

100% 

1999 

(179) 
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Dihydropteroate synthetase [Sa] 


1AD1(A), 1AD4(A) 

2.20 A, 2.40 A 

DNA helicase pcra[Sa] 

1QHH(A) 

2.50 A 

DNA topoisomerase 1 

1EJ9(A) 

2.60 A 

1A36(A) 

2.80 A 

1A31(A), 1A35(A) 

2.10 A, 2.50 A 

Estrogen receptor 1 a 

1QKT(A), 1QKU(A) 

2.20 A, 3.20 A 

1HCP 

NMR 

1A52(A) 

2.80 A 

1ERR(A), 1ERE(A) 

2.60 A, 3.10 A 

1HCQ(A) 

2.40 A 

3ERT(A), 3ERD(A) 

1.90 A, 2.03 A 

FK506-binding protein 

1TC0(C) 

2.50 A 

1FKD, 2FKE 

1.72 A 

1FKJ, 1FKK, 1FKL 

1.70 A, 2.20 A 

1FAP(A) 

2.70 A 

3FAP(A), 4FAP(A) 

1.85 A, 2.80 A 

1NSG(A) 

2.20 A 

1FKR, 1FKS, 1FKT 

NMR 

1EYM(A) 

2.00 A 

1BL4(A) 

1.90 A 

1D60(A), 1D7H(A), 1D7I(A), 1D7J(A) 

1.85 A-1.90 A 

1QPF(A), 1QPL(A) 

2.50 A, 2.90 A 

1F40(A) 

NMR 

1B6C(A) 

2.60 A 

1A7X(A) 

2.00 A 

1BKF 

1.60 A 

2FAP(A) 

2.20 A 

1C9H(A) 

2.00 A 

1FKG, 1FKH, 1FKI(A) 

2.00 A , 1.95 A , 2.20 A 

1FKF 

1.70 A 

1FKB 

1.70 A 


S. aureus 


100 % 


1997 


(180) 


B. thermophilus 

71% 

1999 

(181) 

Human 

99% 

2000 

(182) 

Human 

99% 

1998 

(183) 

Human 

90% 

1998 

(184) 

Human 

98% 

1999 

(185) 

Human 

100% 

1993 

(186) 

Human 

99% 

1998 

(187) 

Human 

99% 

1997 

(188) 

Human 

100% 

1993 

(189) 

Human 

98% 

1999 

(190) 

Bovine 

99% 

1996 

(75) 

Human 

100% 

1993 

(191) 

Cow 

97% 

1995 

(193) 

Human 

100% 

1996 

(194) 

Human 

100% 

1999 

(195) 

Human 

100% 

1997 

(196) 

Human 

100% 

1992 

(197) 

Human 

99% 

2000 

(198) 

Human 

99% 

1998 

(199) 

Human 

100% 

1999 

(200) 

Human 

100% 

1999 

(201) 

Human 

100% 

2000 

(202) 

Human 

100% 

1999 

(203) 

Human 

100% 

1998 

(204) 

Human 

98% 

1995 

(205) 

Human 

100% 

1998 

(206) 

Human 

83% 

1999 

(207) 

— 

100% 

1993 

(208) 

— 

100% 

1991 

(209) 

— 

100% 

1992 

(210) 
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Table 11.1 (Continued) 


Target and PDB Reference 

Resolution 

Source 

Homology 

Year 

Reference 

Follicle stimulating hormone 

1FL7CB) 

3.00 A 

Human 

99% 

2000 

(211) 

GABA transferase 

1GTX(A) 

3.00 A 

Pig 

94% 

1999 

(212) 

Glucocorticoid receptor 

1LAT(A) 

1.90 A 

Rat 

85% 

1995 

(213) 

1GLU(A) 

2.90 A 

— 

94% 

1992 

(214) 

Glutamate receptor 1 

1EWK(A), 1EWT(A), 1EWV(A) 

2.20 A, 3.70 A, 4.00 A 

Rat 

98% 

2000 

(215) 

Glutathione peroxidase 

1GPHA) 

2.00 A 

— 

90% 

Jun 1985 

(216) 

G-CSF3 

1CD9(A), 1PGR(A) 

2.80 A, 3.50 A 

Mouse 

98% 

1999 

(217) 

1BGC, 1BGD, 1BGE(A) 

1.70 A, 2.30 $2.20 A 

— 

80% 

1993 

(218) 

1GNC 

NMR 

— 

100% 

1994 

(219) 

1RHG(A) 

2.20 A 

— 

98% 

1993 

(220) 

Granulocyte-macrophage CSF 

1CSGCA) 

2.70 A 

— 

100% 

1992 

(30) 

2GMF(A) 

2.40 A 

Human 

100% 

1996 

(221) 

Growth hormone receptor 

1A22(B) 

2.60 A 

Human 

100% 

1998 

(222) 

1AXKB) 

2.10 A 

Human 

98% 

1997 

(223) 

1HWG(B), 1HWH(B) 

2.50 A, 2.90 A 

Human 

100% 

1996 

(224) 

3HHR(B) 

2.80 A 

— 

100% 

1993 

(225) 

HIV reverse transcriptase 

1DLO 

2.70 A 

HIV-1 

98% 

1996 

(226) 

1RT3(B) 

3.00 A 

HIV-1 

99% 

1998 

(227) 

1HPZ, 1HQE, 1HQU 

3.00 A, 2.70 A 

HIV-1 

98% 

2000 

(228) 

1BQM, 1BQN 

3.10 A, 3.30 A 

HIV-1 

98% 

1998 

(229) 

1TVR(B), 1UWB(B) 

3.00 A, 3.20 A 

HIV-1 

99% 

1996 

(230) 

1EET 

2.73 A 

HIV-1 

98% 

2000 

(231) 

1IKV, 1IKW, 1IKX, 1IKY 

2.80 A -3.00 A 

HIV-1 

98% 

2001 

(232) 

lHVU(B) 

4.75 A 

HIV-1 

99% 

1998 

(233) 
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1C1B(B) 

2HMI(B) 

1HYS 

1HMV 

1HNI 

1HNV 

1FKP(B) 

1JLA, 1JLB, 1JLC, 1JLE, 1JLF, 1JLG 
1J50,1QE1(B) 

3HVT(B) 

Inosine monophosphate dehydrogenase 2 

lJRl(A) 

1B30(A) 

Insulin-like growth factor 1 

3LRI(A) 

1BQT 

1IMX(A) 

1B9G(A) 

2GF1, 3GF1 

Insulin-like growth factor 1 receptor 

1IGR(A) 

1GAG(A) 

1144(A) 

1IR3(A) 

1IRK 

Insulin-like growth factor 2 

1IGL 

Integrin alpham 
1BHQ(1), 1IDN(1) 

1JLM 

1IDO 

Intercellular adhesion molecule 1 

1IAM 

llCl(A) 

1D3E(I), 1D3I(I), 1D3L(A) 


2.50 A 

2.80 A 
3.00 A 
3.20 A 
2.80 A 
3.00 A 
2.90 A 

2.50 A -3.00 A 

3.50 A, 2.85 A 
2.90 A 

2.60 A 

2.90 A 

NMR 

NMR 

1.82 A 

NMR 

NMR 

2.60 A 
2.70 A 
2.40 A 

1.90 A 

2.10 A 

NMR 

2.70 A 
2.00 A 

1.70 A 

2.10 A 

3.00\?\ 

2.80 A, 2.60 A, 3.25 A 


HIV-1 pol 

99% 

1999 

(234) 

vims 

99% 

1998 

(235) 

Virus 

98% 

2001 

(236) 

Virus 

98% 

1994 

(237) 

Virus 

98% 

1995 

(238) 

virus 

98% 

1995 

(239) 

virus 

99% 

2000 

(240) 

virus 

99% 

2001 

(241) 

HIV-1 

99% 

1999 

(242) 

— 

99% 

1994 

(243) 

Chinese hamster 

98% 

2001 

(244) 

Human 

100% 

1998 

(245) 

Human 

86% 

1999 

(246) 

Human 

100% 

1998 

(247) 

Human 

100% 

2001 

(248) 

— 

81% 

1999 

(249) 

— 

100% 

1991 

(250) 

Human 

100% 

1998 

(251) 

Human 

78% 

2000 

(252) 

Human 

79% 

2001 

(253) 

Human 

78% 

1997 

(254) 

— 

79% 

1995 

(255) 

— 

100% 

1994 

(256) 

Human 

100% 

1998 

(257) 

Human 

100% 

1996 

(258) 

Human 

96% 

1996 

(259) 

Human 

98% 

1998 

(260) 

Human 

100% 

1998 

(261) 

Human rhinovirus 

100% 

1999 

(262) 
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Table 11.1 (Continued) 


Target and PDB Reference 

Resolution 

Source 

Homology 

Year 

Reference 

Interferon a 1 

1ITF 

NMR 

Human 

82% 

1997 

(263) 

1RH2(A) 

2.90 A 

Human 

83% 

1996 

(264) 

Interferon y 

1FG9(A) 

2.90 A 

Human 

100% 

2000 

(265) 

1FYH(A) 

2.04 A 

Human 

100% 

2000 

(266) 

1EKU(A) 

2.90 A 

Human 

98% 

2000 

(267) 

1HIG(A) 

3.50 A 

— 

99% 

1991 

(268) 

Interleukin 1 

2ILA 

2.30 A 

— 

99% 

1991 

(269) 

Interleukin 1 receptor 

1G0Y(R) 

3.00 A 

Human 

100% 

2000 

(270) 

1IPA(Y) 

2.70 A 

Human 

100% 

1998 

(271) 

1ITB(B) 

2.50 A 

Human 

100% 

1997 

(272) 

Interleukin 10 

1VLK 

1.90 A 

Epstein-Barr virus 

92% 

1997 

(273) 

2 ILK 

1.60 A 

Human 

100% 

1996 

(274) 

1ILK 

1.80 A 

Human 

100% 

1995 

(275) 

1J7V(L) 

2.90 A 

Human 

100% 

2001 

(276) 

1INR 

2.00 A 

Human 

100% 

1995 

(277) 

Interleukin 12 

1F42(A), 1F45(A) 

2.50 A, 2.80 A 

Human 

100% 

2000 

(278) 

Interleukin 13 

1GA3(A) 

NMR 

Human 

100% 

2000 

(279) 

Interleukin 2 

URL 

NMR 

Human 

99% 

1995 

(280) 

3INK(C) 

2.50 A 

— 

99% 

1992 

(281) 

Interleukin 3 

1JLI 

NMR 

Escherichia coli 

87% 

1995 

(282) 

Interleukin 4 

1HIJ, 1HIK 

3.00 A, 2.60 A 

Human 

100% 

1995 

(283) 

1IAR(A) 

2.30 A 

Human 

100% 

1999 

(284) 

1HZI(A) 

2.05 A 

Human 

99% 

2001 

(285) 

1ITM 

NMR . 

— 

100% 

1994 

(286) 

1BBN, 1BCN 

NMR 

— 

98% 

1992 

(287) 
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1CYL 

NMR 

2CYK 

NMR 

1ITL 

NMR 

21NT 

2.40 A 

1RCB 

2.25 A 

1ITI 

NMR 

Interleukin 5 

1HUL(A) 

2.40 A 

Interleukin 6 

1IL6, 2IL6 

NMR 

1ALU 

1.90 A 

Interleukin 8 

1IKL, 1IKM 

NMR 

1ICW(A) 

2.01 A 

1ILP(A), 1ILQ(A) 

NMR 

1QE6(A) 

2.35 A 

1R0D(A) 

NMR 

3IL8 

2.00 A 

1IL8(A), 2IL8(A) 

NMR 

Leukotriene A4 hydrolase 

1HS6(A) 

1.95 A 

Lipocortin I 

IAIN 

2.50 A 

1HM6(A) 

1.80 A 

Luteinizing hormone |3 

1QFW(B) 

3.50 A 

1HCN(B) 

2.60 A 

1HRP(B) 

3.00 A 

Macrophage CSF 1 

1HMC(A) 

2.50 A 

Neuraminidase[int B virus] 

1INF 

2.40 A 

1A4G(A), 1A4Q(A) 

2.20 A, 1.90 A 

1IVB 

2.40 A 

1NSC(A), 1NSD(A) 

1.70 A, 1.80 A 

1B9S(A), 1B9T(A), 1B9V(A) 

2.50 A, 2.40 A, 2.35 A 

1INV 

2.40 A 

1NSB(A) 

2.20 A 


Human 

Human 

Human 

Human 

Human 

Human 

Human 

Human 


Human 

Human 

Pig 

Escherichia coli 

Influenza b vims 
Influenza b virus 


100 % 

100% 

100 % 

100 % 

100 % 

98% 

100 % 

100 % 

100 % 

100 % 

97% 

100 % 

97% 

79% 

100 % 

100 % 

100 % 

100 % 

89% 

83% 

83% 

83% 

100 % 

100 % 

94 % 

99 % 

94 % 

99 % 

99 % 

94 % 


1994 

1994 

1992 

1993 

1992 

1993 

1995 

1997 

1997 

1995 

1996 

1998 

1999 
1995 
1990 

1990 

2000 

1992 
2000 

1999 

1994 

1994 

1993 

1995 

1998 

1994 

1993 

1999 

1994 

1991 


( 288 ) 

(289) 

(290) 

(291) 

(292) 

(293) 

(294) 

(295) 

(296) 

(297) 

(298) 

(299) 

(300) 

(301) 

(302) 

(303) 

(304) 

(305) 

(306) 

(307) 

(308) 

(309) 

(310) 

(311) 

(312) 

(313) 

(314) 

(315) 

(316) 

(317) 


Influenza b virus 
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Table 11.1 (Continued) 


Target and PDB Reference 

Resolution 

Source 

Homology 

Year 

Reference 

Neuropeptide Y 

IRON 

NMR 

Human 

100% 

1996 

(318) 

1F8P(A) 

NMR 

— 

97% 

2000 

(319) 

1FVN(A) 

NMR 

— 

91% 

2000 

(320) 

Parathyroid hormone 

1HTH 

NMR 

Human 

88% 

1997 

(321) 

1FVY(A) 

NMR 

Human 

100% 

2000 

(322) 

1BWX, 1HPY, 1ZWA, 1ZWC 

NMR 

Human 

100% 

1998 

(323) 

1ET1(A) 

0.90 A 

Human 

100% 

2000 

(324) 

1HPH 

NMR 

— 

100% 

1995 

(325) 

1ZWB, 1ZWD, 1ZWE, 1ZWF, 1ZWG 

NMR 

— 

100% 

1996 

(326) 

PDGFjS 

1PDG(A) 

3.00 A 

— 

100% 

1992 


Phospholipase A2 

1BCI 

NMR 

Human 

100% 

1998 

(327) 

1RLW 

2.40 A 

Human 

98% 

1997 

(328) 

1CJY(A) 

2.50 A 

Human 

100% 

1999 

(329) 

Potassium channel shaker 

1A68 

1.80 A 

Sea hare 

87% 

1998 

(330) 

1E0D(A), 1E0E(A), 1E0F(A) 

2.45 A, 1.70 A, 2.38 A 

Sea hare 

87% 

2000 

(331) 

1T1D(A) 

1.51 A 

Sea hare 

88% 

1998 

(332) 

1EXB(E) 

2.10 A 

Rat 

100% 

2000 

(333) 

1DSX(A), lQDV(A), 1QDW(A) 

1.60 A, 2.10 A 

Rat 

94% 

2000 

(334) 

PPAR y 

4PRG(A) 

2.90 A 

Escherichia coli 

97% 

1999 

(335) 

1PRG(A), 2PRG(A) 

2.20 A, 2.30 A 

Human 

97% 

1998 

(336) 

1FM6(D), 1FM9(D) 

2.10 A 

Human 

99% 

2000 

(337) 

3PRG(A) 

2.90 A 

Human 

99% 

1998 

(338) 

Progesterone receptor 

1E3K(A) 

2.80 A 

Human 

100% 

2000 

(55) 

1A28(A) 

1.80 A 

Human 

100% 

1998 

(339) 

Prolactin receptor 

1BP3(B) 

2.90 A 

Human 

100% 

1998 

(340) 

1F6F(B) 

2.30 A 

Rat 

72% 

2000 

(341) 

Retinoic acid receptor 

1DSZ(A) 

1.70 A 

Human 

100% 

2000 

(342) 

1EXA(A), 1EXX(A) 

1.59 A, 1.67 A 

Human 

81% 

2000 

(343) 
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2LBD 

2.00 A 

Human 

80% 

1997 

(344) 

3LBD, 4LBD 

2.40 A 

Human 

80% 

1998 

(345) 

1DKF(B) 

2.50 A 

Human 

100% 

1999 

(346) 

1FCX(A), 1FCY(A), 1FCZ(A) 

1.47 A, 1.30 A, 1.38 A 

Human 

83% 

2000 

(347) 

1HRA 

NMR 

— 

94% 

1993 

(348) 

Retinoid X receptor 

1FM6(A), 1FM9(A) 

2.10 A 

Human 

100% 

2000 

(337) 

1DSZ(B) 

1.70 A 

Human 

100% 

2000 

(342) 

1DKF(A), 1LBD 

2.50 A, 2.70 A 

Human 

99% 

1999 

(349) 

2NLL(A) 

1.90 A 

Human 

100% 

1996 

(350) 

1RXR 

NMR 

Human 

98% 

1998 

(351) 

lGlU(A), 1G5Y(A) 

2.50 A, 2.00 A 

Human 

100% 

2000 

(352) 

1FBY(A) 

2.25 A 

Human 

100% 

2000 

(353) 

1BY4(A) 

2.10 A 

Human 

100% 

1998 

(354) 

Serotransferrin B 

lJNF(A) 

2.60 A 

Rabbit 

78% 

2001 

(355) 

Stem cell factor 

1EXZ(A) 

2.30 A 

Human 

97% 

2000 

(356) 

1SCF(A) 

2.20 A 

Human 

88% 

1998 

(357) 

Thymidine kinase[HHV] 

1KIM(A) 

2.14 A 

HSV-1 

98% 

1997 

(358) 

10HI(A) f 2KI5(A) 

1.90 A 

HSV-1 

98% 

1999 

(359) 

1KI2, 1KI3, 1KI4, 1KI6, 1KI7, 1KI8 

2.20 A -2.37 A 

HSV-1 

99% 

1998 

(360) 

1VTK, 2VTK, 3 W K 

2.75 A, 2.80 A, 3.00 A 

HSV-1 

98% 

1997 

(361) 

1E2H(A), 1E2I(A), 1E2J(A) 

1.90 A, 2.50 A 

— 

99% 

2000 

(362) 

1E2M(A), 1E2N(A), 1E2P(A) 

2.20 A, 2.50 A 

HSV-1 

99% 

2000 

(363) 

1E2K(A), 1E2L(A) 

1.70 A, 2.40 A 

— 

99% 

2000 

(364) 

Tumor necrosis factor receptor 1 

1NCF(A) 

2.25 A 

Human 

100% 

1994 

(365) 

1EXT(A) 

1.85 A 

Human 

100% 

1996 

(366) 

1TNR(R) 

2.85 A 

— 

100% 

1994 

(367) 

Vitamin D receptor 

1IE8(A), 1IE9(A) 

1.52 A, 1.40 A 

Human 

83% 

2001 

(368) 

1DB1(A) 

1.80 A 

Human 

83% 

1999 

(369) 

Xanthine-guanine phosphoribosyltransferase 

1A95(A), 1A97(A), 1A98(A) 

2.00 A, 2.60 A, 2.25 A 

Escherichia coli 

100% 

1998 

(370) 

1NUL(A) 

1.80 A 

Escherichia coli 

100% 

1996 

(371) 

1A96(A) 

2.00 A 

Escherichia coli 

100% 

1998 

(372) 



494 


X-Ray Crystallography in Drug Discovery 


sensitive sequence methods [such as ISS 
(375)] fail to identify many homologous rela¬ 
tionships. Structure is more conserved than 
sequence, so structural classification schemes 
(SCOP, CATH) have been a valuable method 
to assign proteins to functional groups. A now 
classic example of functional understanding 
from structural homology was the discovery 
that the Bcl-2 family of apoptosis proteins are 
homologous to pore-forming toxins (376). This 
finding led to the suggestion that Bcl-2 pro¬ 
teins may function by perforating mitochon¬ 
drial membranes, and has since opened new 
avenues of fruitful research. 

In addition to structural homology as as¬ 
sessed by global similarities, local structural 
features can give clues to structure even when 
proteins are not homologous. By identifying 
surface clusters of polar residues that are well 
conserved in the sequence family it is possible 
to identify likely functional sites even when 
there is no obvious structural homology. 
These three-dimensional motifs can be com¬ 
pared with a structure database to identify 
similar motifs with known function. A classic 
example is found among the serine proteases. 
Chymotrypsin and subtilisin share a similar 
catalytic triad (His-Asp-Ser)but are otherwise 
unrelated structurally. The PLP-dependent 
enzymes are famed for the diversity of both 
structure and function, but even among this 
group, common structural motifs seem to have 
evolved convergently (377). 

Simply searching for large clefts in the pro¬ 
tein surface turns out to be an extremely suc¬ 
cessful method to identify active sites. Nucleic 
acid binding functions can be particularly ob¬ 
vious from an analysis of surface electrostatics 
(378,379).Mice homozygous for tubby loss-of- 
function mutations show an obese phenotype, 
and therefore, the tubby protein has attracted 
considerable interest. However, 3 years after 
the initial cloning of the tubby gene (380,3811, 
the molecular function of the protein was still 
a mystery. The structure of the conserved C- 
terminal domain of tubby was determined by 
X-ray crystallography, and a large groove of 
highly positive charge immediately led to the 
hypothesis that the protein acted as a tran¬ 
scription factor (6). The search is now on for 
downstream targets of tubby, and further 
structural work has demonstrated a new role 
for the tubby protein in G-protein-coupled re¬ 


ceptor (GPCR)-mediated signal transduction 
(382) This on-going story demonstrates the 
power of structural approaches to determine 
function. 

In tackling structures of proteins of un¬ 
known function bound metal ions, natural 
substrates or even serendipitously bound 
small molecules arising from the crystal prep¬ 
aration (e.g., buffers) often suggests the loca¬ 
tion of an active site. If the side-chains contrib¬ 
uting to binding are well conserved, then this 
is good evidence locating an active site and 
helps assess the "drugability" of the protein. 
The recent structure of LuxS illustrates the 
power of this approach (373). A number of 
genes had been identified that are required for 
quorum sensing in bacteria by system 2. Quo¬ 
rum sensing by the widely conserved system 2 
has emerged as an intriguing mechanism by 
which bacteria monitor their density and 
seems to be an important component of the 
progression to virulence, at least in certain 
pathogens. LuxS is the product of one of the 
genes required for system 2, but nothing was 
known of the molecular function of LuxS in 
this pathway. Disruption of this pathway has 
promise in antibacterial drug design, but 
whether LuxS would be an attractive target 
for small molecule design was unclear. No in¬ 
formation was available to develop a biochem¬ 
ical assay, and besides, it was not clear what 
kind of library to use in high-throughput 
screening. The structure of LuxS was solved at 
Structural GenomiX in less than 2 months, 
and there are representative X-ray structures 
from three different bacteria. The structure 
showed that LuxS forms a dimer in which each 
monomer has a zinc ion coordinated by a His- 
His-Cys triad and water molecule. Non-co- 
valently bound methionine molecules were 
found to have bound in a pocket formed at the 
dimer interface and close to the zinc ions (see 
Fig. 11.7, a and b). Methionine was shown to 
have bound as an artefact of the purification 
procedure. With this information, it became 
immediately clear that LuxS is likely a zinc 
metalloenzyme, and a hypothesis for the likely 
physiological substrate emerged from molecu¬ 
lar modeling studies of the methionine bind¬ 
ing site. This example illustrates how struc¬ 
ture could rapidly accelerate an early stage 
project providing the starting point for assay 
development and selection of an appropriate 
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Figure 11.7. (a) The likely active site of LuxS iden¬ 
tified by searching for clusters of polar, conserved 
residues in the structure, (b) Structure of the LuxS 
monomer highlighting the bound zinc ion (magenta) 
and methionine (green). See color insert. 

screening library (in this case metalloenzyme 
inhibitor libraries would be desirable). The 
model of the likely substrate bound to the ac¬ 
tive site suggests further experiments to test 
this hypothesis and even provides a starting 
point for medicinal chemistry exploration. 

4.3 Pathways 

Increasingly in drug discovery, particular mo¬ 
lecular pathways are attracting interest in 
drug design and often manipulation of any 
of a number of pathway components would 
achieve the same end. Pathways controlling 
apoptosis, the cell cycle, and inflammation all 
contain multiple biologically validated tar¬ 
gets. In microbial disease, several biosynthetic 


pathways, such as peptidoglycan biosynthesis 
and translation, are the targets of current 
drugs and several new pathways are promis¬ 
ing targets for the development of novel 
agents. 

Comprehensive, high-resolution structural 
information of multiple pathway components 
provides a basis for the rational design of in¬ 
hibitors targeting the pathway. Interfering 
with the function of anv of a number of en¬ 
zymes of a pathway may have equally benefi¬ 
cial therapeutic value. Despite this, some en¬ 
zymes may be more tractable targets for the 
design of inhibitors than others. Access to 
high resolution structural information of all 
the components of a therapeutically relevant 
pathway enables the rational choice of the 
best-suited target(s) to pursue for the design 
of agonists and antagonists. This choice may 
depend on such pragmatic considerations as 
the access to libraries targeted to particular 
enzyme types and available synthetic chemis¬ 
try expertise. Furthermore, comparison of the 
binding pockets of consecutive enzymes in the 
pathway that bind similar (or identical) sub¬ 
strates and products may even enable the de¬ 
sign of inhibitors of multiple pathway compo¬ 
nents. Such a compound may be particularly 
desirable in the development of novel anti-mi¬ 
crobial and cancer agents where compound re-, 
sistance can rapidly emerge. The evolution of 
resistance to a drug that inhibits two consec¬ 
utive enzymes in an essential pathway is the¬ 
oretically much less probable than evolution 
of resistance to a single enzyme inhibitor. 

The non-mevalonate isopentenyl pyro¬ 
phosphate biosynthesis pathway has attracted 
attention in recent years as a novel target for 
the design of anti-microbial inhibitors (383). 
At Structural GenomiX, the structures of 
three consecutive enzymes in this pathway 
have been solved. There is now a clear under¬ 
standing of which pathway components may 
be most tractable to inhibitor discovery, which 
likely have least structural homology to hu¬ 
man proteins, and even how to go about the 
design of pan-pathway inhibitors. 

4.4 Protein Structure Modeling 

An aim of structural genomics efforts, to pro¬ 
vide high quality three-dimensional struc¬ 
tures for every protein sequence, will not be 
achieved by experimental approaches alone 
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Many consortiums are selecting targets for X- 
ray crystallography that would provide the 
templates for comparative modeling tech¬ 
niques of all other sequences (384, 385). As 
more structures are determined by NMR and 
X-ray crystallography, the quality of the mod¬ 
els will improve simply because more similar 
templates will become available but also be¬ 
cause and new methods for loop modeling and 
ab initio structure prediction will undoubtedly 
emerge (386,387). Efforts are also underway 
both in industry and academia to assemble da¬ 
tabases of homology models for all sequences 
that can be reasonably well modeled (388). 

5 CONCLUSION 

Anyone who is involved or interested in drug 
discovery will recognize the potential of pro¬ 
tein crystallography to greatly enhance the 
process. Whether this promise has been met to 
date is the subject of considerable debate. 
What is certain, however, is that in the very 
near future the advances in crystallography 
technology will render this question moot. 
The histograms on the PDB website (27, 28) 
that show the increasing rate of structures de¬ 
posited over the last decade are a startling vi¬ 
sual indicator of the revolution that is occur¬ 
ring in the field. Clearly, the impact will be felt 
in drug discovery very soon and perhaps very 
dramatically, and it serves the audience of this 
series to be well informed of these advances in 
technology and their subtle limitations. 

It is tempting to draw analogy with the de¬ 
velopment of other analytical technologies 
(NMR, FAB-MS) and conclude that protein 
crystallography will soon leave the incubator 
of "big machine physics" to become an every¬ 
day, routine tool used in the medicinal chem¬ 
istry laboratory. Hopefully, this chapter has 
shown some of the subtle complexities of sam¬ 
ple preparation and handling, data collection, 
and refinement, etc. that temper this vision 
and will likely keep this a specialized field for 
some time. 
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1 INTRODUCTION 

NMR spectroscopy has been widely used as a 
front-line tool in the pharmaceutical industry 
for several decades. In the past, the main use 
of NMR was in the structural characterization 
of organic molecules synthesized in the course 
of medicinal chemistry programs. Indeed, me¬ 
dicinal chemists have long regarded NMR as 
the premier tool to be used in the structure 
characterization process, to confirm the iden¬ 
tity of intermediates or to determine the 
conformation of lead molecules. Over the last 
decade major developments in both instru¬ 
mentation and methods have resulted in this 
traditional use of NMR in the pharmaceutical 
industry being augmented by a range of excit¬ 
ing new applications. Two of the most impor¬ 
tant of these are the use of NMR in structure- 
based drug design and in screening for drug 
discovery. Both applications differ from the 
traditional use of NMR in that now the mac- 
romolecular binding partner of the medicinal 
compound is included in the mixture to be an¬ 
alyzed; that is, contemporary applications of 
NMR in drug discovery are predominantly fo¬ 
cused on the interaction between drug mole¬ 
cules and their macromoleculartargets. 
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The aim of this chapter is to describe how 
NMR spectroscopy is used in modern drug dis¬ 
covery. The term discovery is used generically 
throughout to include processes that involve 
rational drug design as well as those that in¬ 
volve discovery through NMR screening. The 
latter is a relatively recent development and 
refers to the use of NMR as a tool to screen a 
compound library, to identify a molecule or 
molecules that bind to a chosen macromolecu¬ 
lar target. Of course, the distinction between 
"design" and "discovery" is often quite 
blurred. This is nowhere more evident than in 
the recently developed SAR-by-NMR ap¬ 
proach (1), in which the discovery of several 
weakly bound ligands from a screening pro¬ 
gram is intimately linked to a design process 
to chemically join them. SAR-by-NMR repre¬ 
sents an exciting new technique for lead gen¬ 
eration and is described in more detail later in 
this chapter. 

Drug design/discovery represents only the 
first stage in the whole drug development pro¬ 
cess. As is clear from the other chapters in this 
volume, there are many other steps that need 
to be made once a lead molecule has been de¬ 
signed or discovered. Although other stages of 
the process, including lead optimization, tox- 
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Cycle A 



( Drug ) 


Figure 12.1. Overview of the drug 
development process and summary 
of various types of NMR experiments 
that contribute at different stages. 


icity studies, preclinical investigations, and 
clinical monitoring, do not fall within the 
scope of this chapter, it is worth mentioning 
that NMR spectroscopy contributes signifi- 
csntly across the whole spectrum of drug de¬ 
velopment, right through into the clinical 
domain. For example, NMR spectroscopy 
has been applied for the detection of drug 
metabolites in biological fluids and magnetic 
resonance imaging, which is based on the 

fundamental principles of NMR, plays an 
important role in clinical investigations. It is 

increasingly being used to monitor the func¬ 
tional outcomes of drug therapy. We briefly 
address these broader applications of NMR 
before returning to the main topic of NMR in 
drug discovery. 


1.1 Overview of Drug Development 

To give an overview of the breadth of applica¬ 
tions of NMR, Fig. 12.1 summarizes the drug 
development process and indicates the role of 
NMR at various stages. Drug development is 
an iterative process and can be simplified by 
representing it with two interconnected cycles 
of activity. Cycle A involves the design or dis¬ 
covery of an initial lead followed by its synthe¬ 
sis and bioassay. Based on the initial assay 
results there may be several loops around this 

cycle before commencing the in vivo studies 
represented in Cycle B. At this stage consider¬ 
ation of bioavailability, metabolism, and phar¬ 
macokinetic profiles must be made and this 
may involve synthetic modifications of the 
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lead molecules to improve their druglike prop¬ 
erties. Again, several loops around Cycle B 
may be necessary before one or more develop¬ 
ment candidates are identified. Ultimately one 
or two of these development candidates are 
identified for progression through clinical tri¬ 
als. 

As indicated in Fig. 12.1, it is convenient to 
envisage five broad categories of NMR experi¬ 
ments that may contribute to this overall drug 
development process. 

1. Small molecule, or ligand-based, NMR. 
This involves studies of drugs and drug 
leads, typically organic molecules with a 
molecular weight <500 Da, but also includ¬ 
ing small proteins of up to a few kDa. These 
studies may be used to characterize natural 
products or synthetic drug leads, or to de¬ 
termine their conformation. 

2. MacromolecularNMR. This involves stud¬ 
ies of the macromolecular targets of drugs, 
typically to determine their three-dimen¬ 
sional structure and/or the nature of their 
complexes with ligands. 

3. NMR screening. This involves the use of 
NMR to identify lead molecules that bind 
to a macromolecular target. These studies 
typically involve both small molecules and 
macromolecules and seek to detect the 
presence of binding interactions between 
them. 

4. Metabolic NMR. This involves studies of 
endogenous molecules whose levels may be 
modified by drug treatment, or studies of 
the metabolites of drugs themselves. 

5. NMR imaging. Such studies provide ana¬ 
tomical information in an animal model or 
human patient. This includes, for example, 
monitoring the size of plaques or tumors in 
the brains of Alzheimer's or cancer pa¬ 
tients, respectively, during drug therapy. 

It is clear from these descriptions that 
NMR covers a wide range of applications in 
the pharmaceutical industry, although for the 
remainder of this chapter we will focus on 
NMR in the drug design/discovery phase of 
drug development, that is, on categories 1-3 of 
the preceding list. Together, the studies in cat¬ 


egories 1 and 2 may be classified as structure- 
based design, whereas category 3 relates to 
drug discovery. 

1.2 Scope of Chapter 

Our aim is to give a broad overview on the use 
of NMR as a tool in structure-based design and 
in screening approaches to drug discovery. 
The chapter also contains a description of the 
relevant NMR methods, which are highlighted 
by illustrative examples. We briefly describe 
the instrumentation required for such studies 
and emerging trends in the field are discussed. 
This includes developments in the field of drug 
discovery in the postgenomic era that are 
likely to have an impact on the way in which 
NMR is used, as seen for example by the recent 
interest in structural genomics programs. 
NMR instrument developments are also de¬ 
scribed. For example, recent advances in cryo- 
probe technology promise to dramatically in¬ 
crease the sensitivity of NMR spectroscopy 
and increase its application across the phar¬ 
maceutical industry. Finally, a section outlin¬ 
ing some of the practical considerations in 
structure-based design and screening is in¬ 
cluded. Future directions for the field are men¬ 
tioned throughout the discussion. 

There have been a number of reviews that 
describe applications of NMR in drug discov¬ 
ery or screening and the reader is referred to 
these for additional information (2-15). Re¬ 
cent books covering aspects of NMR in drug 
design are also available (16,17). 

It is assumed that most readers will be fa¬ 
miliar with the basic principles of NMR. How¬ 
ever, for completeness and to define some of 
the terms that will be used in this chapter it is 
useful to give a brief overview of the princi¬ 
ples. Excellent texts are available to provide 
more detail (18,19). 

1.3 Principles of NMR Spectroscopy 

The underlying basis of NMR is that when 
nuclei with a nonzero spin quantum number 
are placed in a magnetic field they take up one 
of a discrete number of quantized states. The 
application of radiofrequency (rf) energy pro¬ 
duces transitions between these states. The 
energy changes associated with these transi¬ 
tions are detected as small voltages induced in 
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rf pulse 



Figure 12.2. Overview of the principles of NMR spectroscopy. Polarization of nuclear spins by a 
magnetic field is perturbed by application of a radiofrequency (rf) pulse. The resultant signal is 
Fourier transformed, to yield a spectrum reflecting the number and environments of nuclei in the 
sample. 


a receiver coil that are subsequently amplified, 
digitized, and processed to yield spectra, as il¬ 
lustrated in Fig. 12.2. The most commonly 
studied NMR-active nucleus is the proton, 1 H, 
but in modern NMR experiments 2 H, 13 C, and 
15 N nuclei are also very important. For these 
heteronuclei it is common to isotopically en¬ 
rich the sample because of their low natural 
abundance. This is particularly important for 
studies of proteins, as will become apparent 
later in this chapter. Occasionally, other nu¬ 
clei find specialist applications. For example, 
in fluorine-containing drugs it is possible to 
use sensitive 19 F-NMR signals to monitor in¬ 
teraction with target proteins, as described 
later in this chapter. 

In modern spectrometers the rf energy is 
supplied in the form of short pulses (typically, 
-10 fxs) that simultaneously excite all nuclei 
cf a given isotope type (e.g., all protons or all 
13 C nuclei). Nuclei of a given isotope that are 
in different chemical environments by virtue 
cf their atomic locations in the molecule have 
slightly different resonance frequencies and 
lead to different oscillating voltages in the re¬ 
ceiver coil. The resultant combined signal, 
termed a free induction decay (FID), is Fourier 
transformed to give a spectrum that is basi¬ 
cally a plot of peak intensity vs. frequency, 
with one peak for each chemically distinct nu¬ 
cleus. These features are schematically illus¬ 
trated in Fig. 12.2. The frequency axis is 
termed the chemical shift because it reflects 
the local chemical environment of each nu¬ 
cleus. The range of chemical environments of 


nuclei in a molecule is such that chemical 
shifts range up to only a few hundred parts per 
million (ppm) of the base resonance frequency 
for 13 C and 15 N. For the range is smaller 
still, covering only about 10 ppm. Despite this 
small range, chemical shifts provide valuable 
diagnostic information on the environment of 
the nucleus giving rise to the signal. 

The chemical shift is an extremely impor¬ 
tant NMR parameter but there are many 
other parameters that can be discerned from 
NMR spectra. Indeed, NMR is unique among 
many forms of spectroscopy in that there are 
so many parameters associated with a spec¬ 
trum other than just peak intensity and fre¬ 
quency. These include coupling constants, 
which provide information on local conforma¬ 
tions and also on molecular connectivities nu¬ 
clear Overhauser effects (NOEs), which pro¬ 
vide information on internuclear distances: 
and relaxation parameters, which provide in¬ 
formation on molecular dynamics. Table 12.1 
summarizes the main NMR parameters that 
may be measured and highlights their applica¬ 
tions in the drug discovery process. 

The following sections of this chapter pro¬ 
vide specific examples of how these various pa¬ 
rameters are useful in the drug discovery pro¬ 
cess. Before doing this, though, it is useful to 
consider some of the limitations of one-dimen¬ 
sional (ID) NMR spectroscopy, particularly 
when the detected nucleus is 1 H, as is most 
commonly the case. With one signal coming 
from each chemically distinct proton and with 
those signals spread only over 10 ppm, it is 




512 


NMR and Drug Discovery 


Table 12.1 NMR Parameters and Their Applications in Drug Design/Discovery 


Parameter 

Information Provided Relevant to Drug Design 

Chemical shift 

Coupling constants 

Nuclear Overhauser effect 
Relaxation times 

Line-shape 

Peak intensities 

Amide exchange rates / 
temperature coefficients 

Reflects local chemical environment; provides a fingerprint marker cf 
structure (particularly in HSQC spectra) 

Conformational analysis, establishing molecular connectivity 
Determining interproton distances, three-dimensional structures 
Molecular dynamics 

Detecting and quantifying chemical exchange processes 

Reflect relative number of nuclei, molecular symmetry 

Hydrogen bonding or solvent exposure of amide protons 


clear that spectral overlap can potentially be a 
major problem for anything but the simplest 
of molecules. The development of higher field 
NMR spectrometers, which effectively provide 
greater dispersion in the frequency dimen¬ 
sion, has contributed significantly to overcom¬ 
ing this limitation and increasing the applica¬ 
tion of NMR for studying pharmaceutically 
relevant molecules. In addition to such instru¬ 
mental developments, methodological ad¬ 
vances have also played a key role in extending 
the use of NMR. Multidimensional NMR 
methods have revolutionized biomolecular 
NMR spectroscopy by removing the limita¬ 
tions of a single frequency dimension, leading 
to the development of 2D, 3D, and 4D spectra. 

A simple way of illustrating multidimen¬ 
sional NMR is through reference to hetero- 
nuclear correlation spectroscopy, in which two 
or more separate frequency dimensions are 
correlated with one another. For example, a 
particularly valuable 2D experiment is 1 H- 15 N 
heteronuclear single quantum correlation 
(HSQC) spectroscopy, in which the resultant 
spectrum has two frequency axes, correspond¬ 
ing to 1 H and 15 N frequency dimensions, and 
one intensity axis. Analogous 1 H- 13 C HSQC 
spectra are also widely used. Such spectra are 
normally represented with the intensity axis 
in contour form so that they may be drawn in 
two dimensions as a set of contour peaks. 
Spectral peaks occur for pairs of 15 N/ 1 H or 
13 C/ 1 H nuclei that are directly bonded to one 
another, and with each frequency being char¬ 
acteristic for the local chemical environment 
they represent a relatively simple, but highly 
characteristic fingerprint of the sample. Fig¬ 
ure 12.3 shows the relationship between ID 
and 2D spectra for the immunosuppressive 


drug cyclosporin, and includes a region of both 
the 1 H/ 15 N and "H/^C HSQC spectra. In 
HSQC spectra-overlap problems are alleviated 
because, even if two protons have the same 
chemical shift and would hence be overlapped 
in a ID spectrum, chances are that the respec¬ 
tive heteronuclear signals will not be over¬ 
lapped, allowing the signals to be resolved in 
the 2D spectrum. HSQC spectra are widely 
used in NMR-based drug screening and we 
will return to them later. 

Multidimensional NMR spectra are not re¬ 
stricted to cases where the separate frequency 
axes encode signals from different nuclear 
types. Indeed, much of the early work on the 
development of 2D NMR was performed on 
cases where both axes involved chemipal 
shifts. The main value in such spectra comes 
from the information content in cross peaks 
between pairs of protons. In COSY-type spec¬ 
tra (COSY = Correlation SpectroscopY) cross 
peaks occur only between protons that are sca¬ 
lar coupled (i.e., within 2 or 3 bonds) to each 
other, whereas in NOESY (NOE Spectros¬ 
copy) spectra cross peaks occur for protons 
that are physically close in space (<5 A apart). 
A combination of these two types of 2D spectra 
may be used to assign the NMR signals of 
small proteins and provides sufficient infor¬ 
mation on internuclear distances to calculate 
three-dimensional structures. Figure 12.3 in¬ 
cludes a panel showing the COSY spectrum of 
cyclosporin and highlights the relationships 
between ID 1 H-NMR spectra and correspond¬ 
ing 2D homonuclear (COSY) and hetero¬ 
nuclear (HSQC) spectra. 

Homonuclear 2D spectra are generally ap¬ 
plicable for the study of proteins up to only 
approximately 80 amino acids in size. For 






Figure 12.3. A schematic representationof the 
(a) ID (b) 2D DQF-COSY; (c) ^N^H-HSQC; 
and (d) 13 C/ 1 H-HSQC spectra of the immuno¬ 
suppressive agent cyclosporin. Example reso¬ 
nances/correlations from residues 6 and 7 have 
been highlighted to illustrate the assignment 
process. 
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Figure 12.4. Block diagram of a modem NMR spectrometer. These systems use superconducting 
magnets that are based on a solenoid of a suitable alloy (e.g., niobium/titanium or niobium/tin) 
immersed in a dewar of liquid helium. The extremely low temperature of the magnet itself (4.2 K) is 
well insulated from the sample chamber in the center of the magnet bore. The probe in which the 
sample is housed usually incorporates accurate temperature control over the range typically of 4 to 
40°Cfor biological samples. The rf coil in the probe is connected in turn to a preamplifier, receiver 
circuitry, analog-to-digital converter (ADC), and a computer for data collection. 


larger proteins the increased number of sig¬ 
nals leads to overlap problems and, in addi¬ 
tion, COSY-type spectra suffer from poor sen¬ 
sitivity when the signal linewidths are of the 
same order as or larger than *H, 1 H scalar 
coupling constants. Such limitations are re¬ 
duced by use of spectra of higher dimensional¬ 
ity (i.e., 3D or 4D spectra) that are based on 
correlations involving heteronuclear rather 
than homonuclear coupling constants. Such 
spectra are important in the structure deter¬ 
mination process for larger proteins and are 
typically recorded for samples that incorpo¬ 
rate uniform labeling with 15 N, or both 13 C 
and 15 N nuclei. Multidimensional spectra that 
involve irradiation of 1 H, 13 C, and 15 N nuclei 
are referred to as triple resonance spectra. 

The details of how multidimensional spec¬ 
tra are obtained is beyond the scope of this 
chapter, but it suffices to say that, like most 
other modem NMR experiments, they involve 
irradiation of the sample with a set of rf pulses 
of defined length, frequency, and phase, with 
specific interpulse delays. The pulse programs 
for such experiments are commonly provided 
with the spectrometer as part of a standard 
library of experiments and may easily be run 
by novice users after input of an appropriate 
set of parameters to define the relevant spec¬ 
tral widths and type of experiment required. 


The above discussion provides a basic over¬ 
view of some of the methods important in 
modern NMR spectroscopy. Before examining 
specific applications in drug discovery it is use¬ 
ful to describe the instrumental requirements 
for such studies. 

1.4 Instrumentation 

NMR spectrometers constitute a powerful and 
homogeneous magnet, a radiofrequency con- 
'sole for generating appropriate rf pulses, a 
probe for applying this rf energy to the sample 
and receivingthe resultant signals, and a com¬ 
puter console for controlling the experiments 
and acquiring the resultant data. These fea¬ 
tures are summarized in Fig. 12.4. Spectrom¬ 
eters are normally specified in terms of the 
resonant frequency of protons at the given 
magnetic field (e.g., 500 MHz corresponds to a 
magnetic field of 11.7 Tesla). Both sensitivity 
and dispersion of signals increase with in¬ 
creasing magnetic field. 

There have been some major break¬ 
throughs in both NMR instrumentation and 
methodology over the last decade that have 
greatly increased the utility of NMR for drug 
discovery applications. These are summarized 
in Table 12.2, which also includes some of the 
earlier milestones in the development of 
NMR. Most notable among recent innovations 
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Table 12.2 Milestones in the Development of NMR Spectroscopy 


Year 

Development 

Nature 

1970 

FT NMR 

Instrumental 

1975 

Superconducting magnets 

Instrumental 

1980 

2D NMR 

Methodological 

1985 

Protein structure determination 

Methodological 

1990 

Isotope labeling/multidimensional NMR 

Methodological 

1990 

Pulsed field gradients 

Instrumental/methodological 

1995 

NMR screening 

Methodological 

1997 

TROSY 

Methodological 

1998 

LC-NMR/LCMS-NMR 

Instrumental 

2000 

Cryoprobes 

Instrumental 


are the use of pulsed-field gradient methods 
for improving spectral quality and allowing 
new types of experiments to be performed, 
transverse relaxation-optimized spectroscopy 
(TROSY) methods (20) for increasing the size 
cf macromolecules that can be examined, and 
cryoprobes for enhancing sensitivity. The de¬ 
velopment of cryoprobes has resulted in the 
biggest single gain in sensitivity over recent 
years, effectively giving 500-MHz spectrome¬ 
ters the sensitivity of 800-MHz spectrometers 
(although without the gain in resolution!). 
The enhanced sensitivity is obtained by cool¬ 
ing the receiver coil and associated circuitry to 
near liquid helium temperatures, thereby re¬ 
ducing thermal noise. There were consider¬ 
able technical barriers to be overcome in de¬ 
veloping such probes because of the large 
difference in temperature between the re¬ 
ceiver coils and the sample, which are only a 
few millimeters apart. These barriers have 
now been overcome and cryoprobes are being 
installed in a large number of laboratories. 
They are also becoming available for higher 
field systems (800 MHz), thus providing fur¬ 
ther sensitivity gains. 

Although the basic configurations of in¬ 
struments tailored for structure-based design 
or for NMR drug screening are similar, there 
are some minor differences. For structure- 
based design applications a relatively high 
field spectrometer is required (>500 MHz), 
usually equipped with three or four radiofre¬ 
quency channels for the simultaneous irradi¬ 
ation of 1 H, 13 C, 15 N, and in some cases 2 H 
nuclei. The greatest sensitivity and dispersion 
are obtained with the highest possible mag¬ 
netic field. Instruments of up to 900 MHz are 


currently available but, at the time of writing, 
only a few have been installed. Numerous 800- 
MHz systems dedicated to structure-based de¬ 
sign have been installed in pharmaceutical 
laboratories. The high field instruments pro¬ 
vide another advantage in that TROSY exper¬ 
iments (20) can be used to produce a marked 
improvement in spectral quality for larger 
proteins. Such developments promise to push 
higher the size of proteins whose structure can 
be determined by NMR. 

For NMR drug screening programs, the ba¬ 
sic requirement of a spectrometer of 500 MHz 
or greater remains, but in addition, an inter¬ 
face that allows the spectrometer to sample a 
library of compounds of potential binding li¬ 
gands needs to be present. This may be done 
either by use of a discrete sample changer or a 
flow-type system. Flow systems have the po¬ 
tential advantage of increased throughput but 
have the potential disadvantage of precipita¬ 
tion of protein samples. In practice this ap¬ 
pears not to have been a major problem and 
both types of systems are in use in the phar¬ 
maceutical industry. Sample changer systems 
currently have the advantage that they may be 
adapted for use with cryoprobe technology 
(currently unavailable for flow systems). Cryo¬ 
probes allow dramatically enhanced sensitiv¬ 
ity gains, which bring particular advantages to 
the study of macromolecule-ligand interac¬ 
tions used in screening programs (21). 

Pulsed-field gradients have become inte¬ 
gral to most modern NMR spectrometers and 
are routinely used both for structure determi¬ 
nation and screening experiments. Another 
recent development has been the interface of 
NMR spectrometers with other instrumenta- 
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Figure 12.5. A summary of the relationship between NMR screening and structure-based design. 
(Adaptedfrom Ref. 15.) 


tion such as liquid chromatography (LC) 
and/or mass spectrometry (MS). The applica¬ 
tions of these instrumental developments to 
drug discovery have been recently reviewed (8, 
13, 22). 

1.5 Applications of NVR in Drug Design 
and Discovery 

Our focus here is on the use of NMR in the 
discovery and design phase of drug develop¬ 
ment. The major role of NMR in the design 
process comes about by its exquisite ability to 
provide structural information, whereas the 
major role of NMR in discovery comes through 
its use as a screening tool to detect the binding 
of novel ligands to macromoleculartargets. As 
already noted, the latter application is a rela¬ 
tively recent development but has created 
much interest in the pharmaceutical industry 
and promises to significantly enhance applica¬ 
tions of NMR in this industry. The impact of 
the methodology is already becoming evident 
even at this early stage, with several SAR-by- 
NMR-derived leads currently in clinical devel¬ 
opment. As already noted, though, the discov¬ 
ery and design phases are often intimately 
connected, with lead molecules discovered in 
screening programs routinely being optimized 
by use of structure-based design approaches 
(Fig. 12 . 5 ). 

In the context of this chapter structure- 
based design refers to the process of determin¬ 
ing the three-dimensional structure of a lead 
molecule or macromolecular target, or deter¬ 
mining the structure of the macromolecule- 
ligand complex, and using this information to 


design new drugs. The questions that may be 
asked when embarking on structure-based de¬ 
sign projects are: 

• What are the solution and bound conforma¬ 
tions of the ligand? 

• What is its charge/tautomeric state? 

• Which functional groups bind to the recep¬ 
tor and what charge state are they in? 

• What is the structure of the receptor? 

• Which parts interact with the ligand? 

• What is the geometry of the ligand-receptor 
complex? 

• What are the kinetics of binding and are 
there dynamic motions of ligand, receptor, 
or the complex? 

Table 12.3 summarizes these and other 
questions and indicates the type of NMR ap¬ 
proaches that can provide answers. Remain¬ 
ing sections of this chapter are organized 
around the headings identified in Table 12 . 3 . 

In considering these questions it is conve¬ 
nient to distinguish between ligand-based de¬ 
sign, where the structural focus is on the small 
lead molecule, and receptor-based design, 
where the aim is to determine the structure of 
the macromolecular target. The NMR meth¬ 
ods used in ligand-based design have been well 
established for many years, based on the use of 
NMR by organic and natural product chemist^ 
for more than four decades. However, there 
have been some important recent advances in 
NMR methods such as the use of pulsed field 
gradients, and in the combination of NMR 
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Table 12.3 Information on Ligands, Macromolecules, and Their Complexes Sought in 
Structure-Based Design and Relevant NMR Technologies Used to Derive This Information 


Target 

Information 

NMR Technology 

Ligand 

Solution conformation 

1D/2D NMR 


Chargeltautomeric state 

Chemical shift/titrations 


Solution dynamics 

Line-shape/relaxation analysis 


Pharmacophore models 

All of the above, and TrNOE, of 


Bound ligand conformation 

multiple ligands 

TrNOE 

Macromolecule 

3D structure 

2D/3D/4D NMR 


Macromolecular dynamics 

Relaxation time measurements 


Structure of articulated 

TROSY 

Ligand-macromolecular 

macromolecules (e.g., multimeric 
or membrane-bound receptors) 
Stoichiometry of complex 

Chemical shift/titration 

complex 

Kinetics of binding 

Line width, titration analysis 


Location of interacting sites 

HSQC, isotope editing 


Orientation of bound ligand 

NOE docking 


Bound ligand conformation 

TrNOE 


Structure of complex 

3D/4D NMR 


Dynamics of complex 

Relaxation time measurements 


with other technologies such as LC and MS 
that promise to enhance applications in this 
field (13). The use of NMR to determine the 
three-dimensional structures of macromole¬ 
cules is a newer field, commencing only in 
around 1985, and is one that is still rapidly 
evolving. NMR screening is a still newer ap¬ 
proach, developed since around 1996. Ligand- 
based and receptor-based design are exam¬ 
ined in Sections 2 and 3, respectively, and 
screening-based approaches are examined in 
Section 4. 

2 LIGAND-BASED DESIGN 

Many naturally occurring molecules have po¬ 
tent bioactivity that renders them useful leads 
in the drug design process. These may be nat¬ 
urally occurring hormones, neurotransmit¬ 
ters, or other endogenous molecules, or they 
may be bioactive molecules from plants or mi¬ 
croorganisms. Furthermore, screening pro¬ 
grams on synthetic compound libraries fre¬ 
quently result in the discovery of bioactive 
molecules that then become starting points in 
drug design. The general aim of ligand- or an¬ 
alog-based design is to determine the struc¬ 
ture and conformation of a known bioactive 
molecule and then mimic this conformation in 


a designed lead compound, with the aim of 
improving the activity or druglike properties. 
The following sections examine various as¬ 
pects of ligand-based design and illustrate 
them with examples. 

2.1 Structure Elucidation 

If the bioactive molecule is a synthetic prod¬ 
uct, its structure may be rapidly deduced by a 
simple comparison of NMR parameters (often 
combined with MS) of the product relative to 
those of the known precursor, to see whether 
the desired chemical transformation has 
taken place. If the bioactive compound is an 
unknown molecule discovered in an active 
fraction in bioassay-guided screening, then 
the first step is to elucidate its structure. Typ¬ 
ical molecules that form the basis of such nat¬ 
ural products-based drug discovery studies in¬ 
clude "organic" natural products as well as 
small peptides and proteins. The approaches 
to structure elucidation for natural products 
and peptides/proteins are a little different 
from each other and are described in turn. 

2.1.1 Structure Elucidation of Natural Prod¬ 
ucts. In the case of nonpeptidic natural prod¬ 
ucts the main structural focus initially is to 
elucidate the carbon framework. This nor- 
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Figure 12.6. Illustration of the HMBC corre¬ 
lations (arrows)used to assign the positions of 
two cf the methyl quaternary methyl groups 
in taxol. 

mally involves a combination of ID 1 H and 
l3 C-NMR, followed by homonuclear (DQF- 
COSY, TOCSY, ROESY, or NOESY) and het- 
eronuclear (HSQC, HMBC) 2D experiments. 
Heteronuclear multiple bond correlation 
(HMBC) spectra are particularly valuable be¬ 
cause they assist in tracing the backbone of 
the molecule. Such spectra display cross peaks 
between a 13 C nucleus and protons connected 
within two or three bonds and, in doing so, 
provide valuable information on molecular 
connectivity. Figure 12.6 shows typical HMBC 
correlations seen for selected regions of taxol, 
a plant-derived natural product that is cur¬ 
rently a leading treatment for breast and ovar¬ 
ian cancers. Although the structure of taxol 
itself was originally deduced from a combina¬ 
tion of X-ray crystallography on a degradation 
product and a range of 1 H and 13 C spectra in 
the 1970s, before HMBC spectra had been in¬ 
vented, HMBC spectra have been widely used 
for studies of the many taxol derivatives that 
have been examined in the last decade. 

Elucidation of the carbon framework of 
natural products often yields substantial in¬ 
formation about the three-dimensional struc¬ 
ture at the same time, but if there are remain¬ 
ing questions on the stereochemistry of chiral 
centers or other factors affecting the three- 
dimensional structure, these can usually be 
resolved from NOESY svectra and/or an anal¬ 
ysis of coupling constants. We will return to 
the taxol example later in Section 2.2 when 
describing conformational analysis. 

2.1.2 Structure Determination of Bioactive 
Peptides. In contrast to the process described 
for organic molecules, the structure elucida¬ 


tion of peptide-based natural products in¬ 
volves two distinct steps: (1) the elucidation of 
the primary structure (amino acid sequence) 
followed by (2) a determination of secondary/ 
tertiary structure. The primary structure de¬ 
termination is routinely done through Edman 
sequencing, or more recently by MS-MS meth¬ 
ods. NMR plays a key role in the elucidation of 
the secondary and tertiary structure of pep¬ 
tides, mainly based on 2D homonuclear NMR 
spectroscopy. A combination of DQF-COSY 
and TOCSY (TOtal Correlalation Svectros- 
copY) spectra are used to assign spin systems 
to amino acid types and then NOESY spectra 
are used to sequentially assign the resonances 
to individual protons in the peptide (23). The 
three-dimensional structure is then deter¬ 
mined by deriving a series of internuclear dis¬ 
tance restraints from the NOESY spectrum 
and using them in a simulated annealing algo¬ 
rithm to calculate a family of structures con¬ 
sistent with them. 

Because the structure determination of 
peptides and proteins represents a very impor¬ 
tant contribution of NMR to the drug develop¬ 
ment process, it is informative to describe the 

vrocess in more detail. To do this we will use 

& 

the recently developed peptide-based drug 
MVIIA as an example. 

2.1.2.1 NMR Structure of Ziconotide: A 
Novel Treatment for Pain. MVIIA, now known 
as Ziconotide, is a 25-amino acid peptide orig¬ 
inally discovered from the venom of the ma¬ 
rine cone snail, Conus magus. Like other 
w-conotoxins it is a potent blocker of N-type 
calcium channels, giving it a wide range of po¬ 
tential therapeutic applications. When deliv¬ 
ered intrathecally (i.e., through spinal infu- 



2 Ligand-Based Design 


519 


sion), it is approximately 1000 times more 
potent than morphine as an analgesic and has 
great potential for the treatment of intracta¬ 
ble cancer pain (24). Figure 12.7 shows the 
peptide sequence and illustrates selected re¬ 
gions of the TOCSY and NOESY spectra. 

As seen in Fig. 12.7, the TOCSY experi¬ 
ment is useful for classifying spin systems to 
amino acid type, with typically the most useful 
region being the "skewers" emanating from 
individual NH shifts (-7-10 ppm). For each 
NH proton in the peptide a series of cross 
peaks to the a, j3, and other side-chain protons 
is observed and these patterns define the spin 
system as belonging to a particular type of 
amino acid. Note, however, that there is some 
degeneracy in the resultant patterns. The NH 
side-chain pattern is truncated if there is a 
break of more than three bonds between pro¬ 
tons within the spin system. This means, for 
example, that the skewers for aromatic resi¬ 
dues extend only as far as the /3-protons and 
they therefore appear similar to other “AMX” 
residues such as Cys, Ser, Asp, or Asn. Never¬ 
theless, the ability to assign signals to either 
individual amino acid types or to the AMX 
group is a useful starting point in the assign¬ 
ment. However, such spectra provide no infor¬ 
mation about the sequential location of an 
amino acid if it is not unique in the sequence. 
These sequential assignments are obtained 
from the NOESY spectrum, as illustrated in 
the sequential walk shown in the middle panel 
cf Fig. 12.7. The aim of the sequential assign¬ 
ment process is to locate adjacent amino acid 
spin systems, principally through a cross peak 
between the aH proton of one residue (0 and 
the NH of the following residue {i + 1), often 
denoted as daN(i, i + 1). Additional support 
for the assignment is usually also sought in 
dj8N(/,i + l)anddNN(i,i + l)correlations. At 
the early stages of an assignment it is impos¬ 
sible to be certain whether a particular cross 
peak is a sequential or longer range cross 
peak; however, as the assignment procedure 
progresses, ambiguities become resolved. The 
assignment process is generally highly conver¬ 
gent, in that once a series of correct assign¬ 
ments is made the number of choices for re¬ 
maining cross peaks diminishes, in principle 
making their assignment easier. 

Because peptides are polymers of amino ac¬ 


ids units, the repeated NH, Ha, and side-chain 
protons tend to fall in characteristic chemical 
shift ranges that can be useful in looking for 
patterns to identify amino acid types. Table 
12.4 shows typical chemical shifts for each of 
the 20 common amino acids when located in a 
"random-coil" environment (23, 25. 26). It is 
important to stress that these shifts can vary 
quite considerably in structured proteins (by 
up to several ppm) and are more useful for 
pattern recognition purposes than for exact 
identification of a particular residue. In the 
case of the Ha protons, the differences be¬ 
tween the actual shifts in a structured protein 
and these random-coil values have an addi¬ 
tional important use, in that they provide an 
indication of the local secondary structure. In¬ 
tuitively, the further a chemical shift is from a 
random-coil value, the more likely it is attrib¬ 
uted to that residue's being in a structured 
environment. 

After the assignment is complete it is pos¬ 
sible to derive substantial information about 
the secondary structure from an analysis of 
chemical shifts, coupling constants, and 
NOEs, even before the three-dimensional 
structure calculations are commenced. Figure 
12.8 shows a typical summary of the relevant 
NMR information, again using the data for 
MVIIA as an example (27, 28). Trends in these, 
data provide a general indication of major ele¬ 
ments of secondary structure. For example, a 
series of strong daN(z, i + 1), relative to 
dNN(i, i + 1) NOEs often indicates an ex¬ 
tended or j3-type structure, whereas strong 
dNN(i, i + 1) NOEs indicate local helical 
structure or turns. Large JaN coupling con¬ 
stants (>8.5 Hz) are associated with extended 
structure and small ones (<5 Hz) with helical 
structure. Similarly, deviations of chemical 
shifts from random-coil values, often repre¬ 
sented in terms of "chemical shift indices" 
(29), indicate extended (positive values) or he¬ 
lical structure (negative values). 

An additional useful parameter is the ex¬ 
change rate of amide protons after dissolution 
of the sample in D 2 0. Slowly exchanging 
amide protons indicate protection from sol¬ 
vent and possible involvement in intramolec¬ 
ular hydrogen bonds associated with elements 
of secondary structure. All of the NMR and 
slow exchange data can be consolidated to give 
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Figure 12.7. Schematic representations of 2D-NMR spectra of the conotoxin MVIIA, (a) The fin¬ 
gerprint region of the TOCSY spectrum with selected spin systems marked, (b) Fingerprint region of 
the NOESY spectrum showing two (K2-A6 and L11-Y13) sequential walks, (c) NH-NH region of the 
NOESY spectrum showing correlations between the NH protons of D14 and G15; C16 and T17; and 
S22 and G23. 
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Table 12.4 Chemical Shifts for the 20 Common Amino Acid Residues 

in Random-Coil Peptides 3 


Residue 

NH 

aH 

j3H 

Others 

Ala 

8.24 

4.32 

1.39 


Arg 

8.23 

4.34 

1.89, 1.79 

yCH 2 1.70,1.70 
SCH 2 3.32, 3.32 
NH 7.17, 6.62 

Asn 

8.40 

4.74 

2.83, 2.75 

yNH 2 7.59, 6.91 

Asp 

8.34 

4.64 

2.84, 2.75 


Cys 

8.43 

4.71 

3.28, 2.96 


Gin 

8.32 

4.34 

2.13, 2.01 

yCH 2 2.38, 2.38 
SNH 2 6.87, 7.59 

Glu 

8.42 

4.35 

2.09, 1.97 

yCH 2 2.31, 2.28 

Gly 

8.33 

3.96 



His 

8.42 

4.73 

3.26, 3.20 

2H 8.12 

4H 7.14 

lie 

8.00 

4.17 

1.90 

yCH 2 1.48, 1.19 
yCH 3 0.95 

SCH 3 0.89 

Leu 

8.16 

4.34 

1.65, 1.65 

yH 1.64 

5CH 3 0.94, 0.90 

Lys 

8.29 

4.32 

1.85, 1.76 

yCH 2 1.45, 1.45 
SCH 2 1.70, 1.70 
sCH 2 3.02, 3.02 
eNH 3 + 7.52 

Met 

8.28 

4.48 

2.15, 2.01 

tCH 2 2.64, 2.64 
eCHg 2.13 

Phe 

8.30 

4.62 

3.22, 2.99 

2,6H 7.30 

3,5H 7.39 
4H7.34 

Pro 


4.42 

2.28, 2.02 

yCH 2 2.03, 2.03 
8CH 2 3.68, 3.65 

Ser 

8.31 

4.47 

3.88, 3.88 


Thr 

8.15 

4.35 

4.22 

yCH3 1.23 

Trp 

8.25 

4.66 

3.32, 3.19 

2H 7.24 

4H 7.65 

5H 7.17 

6H 7.24 

7H 7.50 

NH 10.22 

Tyr 

8.12 

4.55 

3.13, 2.92 

2,6H 7.15 

3,5H 6.86 

Va! 

8.03 

4.12 

2.13 

-yCHg 0.97, 0.94 


"The backbone shifts (aH and NH, ppm) are from Wishart et al. (26). The remaining shifts are from Wiithrich 1986 (23). 


an accurate representation of secondary struc¬ 
ture, as indicated in the lower panel of Figure 
12.8. In the case of MVIIA a triple-stranded 
j3-sheet may be deduced on the basis of the 
local NOE, coupling, chemical shift, and 
amide-exchange NMR data. 

Once all peaks in the 2D spectra have been 
assigned, cross peaks in the NOESY spectrum 
are used to derive a series of interproton dis¬ 


tance restraints. These are then used in a sim¬ 
ulated annealing algorithm to calculate a fam¬ 
ily of 3D structures consistent with the input 
restraints. Fig. 12.9 shows two commonly 
used methods of representing such NMR-de- 
rived structures, either as a stereoview of the 
superimposed family of structures or as a rib¬ 
bon diagram, in which elements of secondary 
structure are highlighted. For the latter rep- 
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Figure 12.8. A summary of the NMR data observed for MVIIA. (a) Ha-NH sequential NOEs. (b) 
NH-NH sequential NOEs. (c) H/3-NH sequential NOEs. (f-h) Other short-range NOEs. The thickness 
of the bar indicates the strength of the observed NOE (weak, medium, or strong), (d) Three-bond 
NH-Ha coupling data, where upward-pointing arrows indicate a large coupling (>8 Hz) and down¬ 
ward-pointing arrows indicate a small coupling (<5 Hz), (e) H/D exchange data, where a filled circle 
represents a slow exchanging NH. (i) Chemical shift index (CSI) data. The CSI uses a scoring system 
that compares Ha shifts to random-coil chemical shifts. A sequence cf consecutive +1 scores is 
indicative cf ^-structure, whereas a sequence of consecutive -1 scores suggests helical structure, (j) 
The /3-sheet of MVIIA. Double-headed arrows indicate observed NOEs and broken lines indicate 
proposed H-bonds. 
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Figure 12.9. (a) A stereoview of the superimposed backbone structures of the 20 lowest energy 
conformations for MVIIA ( 27 ) . (b) Ribbon diagram of MVIIA. 


resentation the lowest energy or average 
member of the ensemble is often chosen as 
representative of the structure. It is impor¬ 
tant, however, to examine the full ensemble to 
gain a complete understanding of the struc¬ 
ture. Regions of disorder in the ensemble can 
be indicative of a lack of sufficient distance 
restraints, perhaps attributable to overlap or 
assignment errors, or may be related to local 
flexibility. 

In the case of MVIIA the peptide itself is 
being clinically developed as the active drug 
foi administration through the intrathecal 
(spinal infusion) route. However, in general, 
peptides have a range of potential disadvan¬ 
tages as drugs, including poor bioavailability 
and susceptibility to proteolytic breakdown. 
Thus, for many cases involving peptide-based 
leads the structural information of the type 
described above might be used as a starting 
point to design smaller constrained peptides 
or nonpeptidic mimics. This is the case, for 
example, in the development of endothelin an¬ 
tagonists described below. 

2.1.2.2 Endothelin as a Lead in Ligand- 
Based Design. Endothelin (ET), shown in Fig. 
12.10, is a 21-amino acid endothelial-derived 
constricting factor that has gained promi¬ 
nence as a pharmacological lead molecule. In¬ 
terest in the peptide arose because of its po¬ 
tent renal, pulmonary, and neuroendocrine 
activities. Endothelin and its isoforms have 
betm implicated in a wide variety of disease 
states including ischemia, cerebral vaso¬ 
spasm, stroke, renal failure, hypertension, 
and heart failure (30). It exerts its pharmaco¬ 
logical effect by acting on specific G-protein- 


coupled receptors. In mammalian species two 
receptors, ET A and ET„ have been cloned; 
both are widely distributed in human tissue 
and are distinguished by different responses 
to various ET isoforms. 

The NMR-derived three-dimensional struc¬ 
ture of ET-1 consists of several distinct re¬ 
gions, including a random-coil N-terminus, a 
j3-turnover residues 5-8, followed by a short 
helical region and a flexible C-terminal tail (as 
summarized in Ref. 31). The presence of the 
flexible tail in solution is not surprising, as 
may be imagined from the primary sequence 
shown in Fig. 12.10. Although solution struc¬ 
tures of ET and its analogs (32-46) deter¬ 
mined by NMR have been valuable in defining 
the gross conformation of these molecules, the 
flexibility of the tail in solution makes it diffi¬ 
cult to extrapolate to the bound state. Indeed, 
an X-ray structure of ET-1 has quite a differ¬ 
ent structure for the C-terminal tail than for 
the random-coil arrangement in solution (46). 
The bound conformation may be different 
again. 

There is clearly an advantage to having 
lead molecules with reduced flexibility, given 
that their solution conformation will intrinsi¬ 
cally provide a better reflection of the bound 
conformation. In addition, the development of 
a more rigid drug will reduce unfavorable en- 
tropic contributions to binding energy. In¬ 
deed, a range of small cyclic peptides that are 
ET a - or ET B -selective antagonists have been 
discovered and provide valuable leads to the 
development of potential therapeutics. NMR 
studies have been instrumental in determin¬ 
ing their solution conformations. For exam- 
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Figure 12.10. (a)Primary sequenceand disulfide connectivities of endothelin-1 (ET-l).(b) Primary 
structure cf the cyclic endothelin antagonist BE18257B and (c) a family of 36 NMR structures, which 
demonstrate the well-defined nature of the cyclic peptide backbone. 


pie, the rather well defined solution confor¬ 
mation (47) of the ET A -selective antagonist 
BE18257B (shown in Fig. 12.10) contrasts 
with the flexibility of the tail region of ET that 
this peptide is thought to mimic. The discov¬ 
ery and development of these molecules illus¬ 
trate the principle that cyclic peptides are of¬ 
ten more suitable than linear peptides as lead 
ligands in drug design. In addition to their bet¬ 
ter-defined and less-flexible conformations 
than those of their linear counterparts, they 
generally have improved bioavailability and 
resistance to protease attack. 

We shall return to endothelin as a lead in 
drug design, in relation to a nonpeptidic an¬ 
tagonist. The underlying theme illustrated by 
the endothelin example is that ligand-based 
design often proceeds from initial studies of 
flexible endogenous molecules (particularly 
peptides) to constrained mimics (e.g., cyclic 
peptides) and often culminates in the develop¬ 
ment of nonpeptidic drug leads. NMR assists 
by defining the structures of the lead and sub¬ 
sequent molecules. 


2,1,3 Instrumental Advances and their Im¬ 
pact on Structure Elucidation. Over the last 
few years there have been several exciting in¬ 
strumental developments that promise to dra¬ 
matically expand the role NMR will play in the 
drug discovery process. These relate to the 
combination of NMR with other technologies 
such as LC and/or MS and the use of NMR to 
directly monitor reactions carried out on solid- 
phase resins (8,13, 22). The latter promises to 
indirectly enhance drug discovery programs 
by improving the monitoring and hence effi¬ 
ciency of solid-phase combinatorial synthesis. 
Effectively,resin-based syntheses can be mon¬ 
itored at successive stages without the need to 
cleave intermediate products from the resin. 

As already mentioned, the additional sensi¬ 
tivity brought about by cryoprobe technology 
promises to enhance a wide range of NMR ap¬ 
plications, but will be particularly important 
in natural products-based drug discovery. In 
many cases only limited amounts of pure com¬ 
pounds are isolated from natural products ex¬ 
tracts and sensitivity has been a major limit- 
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Figure 12.11. Illustration of the Karplus relationship between three-bond scalar coupling constants 
and the dihedral angle of the intervening bond. The relationship is indicated for the <f> torsion angle 
of the H2 and H3 protons within the rigid core of taxol and related derivatives. See Fig. 12.6 for the 
structure of taxol. 


ing factor on structure elucidation. LC/MS/ 
NMR systems will greatly improve the 
efficiency of such analyses by minimizing the 
need for separate sample-handling steps for 
the different analytical technologies. 

2.2 Conformational Analysis 

Usually only ID or 2D NMR methods are re¬ 
quired to determine the solution conformation 
of bioactive ligands. Useful tools include anal¬ 
ysis of chemical shifts, couplingconstants, and 
NOEs. An assumption inherent in the applica¬ 
tion of such studies to drug design is that the 
solution conformation will be maintained on 
binding to the receptor. This is justified in the 
case of relatively rigid ligands. However, for 
potentially flexible ligands the possibility of 
changes in conformation on binding must be 
considered, as noted above for the case of en- 
dothelin. 

Couplingconstants and NOEs are the main 
NMR parameters used in determining the so¬ 
lution conformations of drug leads. NOEs pro¬ 
vide information about through-space proxim¬ 


ity. Three-bond vicinal couplingconstants are 
particularly valuable because their depen¬ 
dency on the intervening dihedral angle' 
through the Karplus relationship allows local 
geometry to be determined. This is illustrated 
in Fig. 12.11 for taxol. Although there are sev¬ 
eral vicinal coupling constants in this mole¬ 
cule (Fig. 12.6), only one 3 J H2 h 3 occurs in a 
region of the molecule that is expected to be 
conformationally rigid and thus suitable for 
conformational determination by use of cou¬ 
pling constants. In taxol and a range of ana¬ 
logs this coupling is in the range 4-7 Hz, con¬ 
sistent with partially eclipsed dihedral angles 
of approximately 120-140" for this ring-con¬ 
strained structure. This is in good agreement 
with the X-ray structure of a taxol analog, 
where the angle is 120". Note that, in general, 
such a Karplus analysis does not give a unique 
solution unless several coupling constants 
sampling the same dihedral angle are present 
and is reliant on the assumption that the mol¬ 
ecule exists only in a single conformation in 
solution. Although it is generally believed that 
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Figure 12.12. Chemical shift changes of the |3-protons of Asp 14 in MVIIA illustrating the lack of 
titration of the adjacent carboxyl group, indicatingits invovlment in salt bridge. By contrast, the shift 
cf a control random-coil peptide varies with an apprent pK a value of 3.7, as expected for an uncom- 
plexed carboxyl moiety in peptides. 


this is the case for the core of taxol, recent 
relaxation data (48) described in section 2.4 
suggest that this conclusion may need to be 
reexamined. 

In addition to studies of the taxol core, 
there have been a large number of studies of 
the conformations of the side chains of taxol 
and it appears that these are certainly flexible 
and that the molecule may adopt both ex¬ 
tended and folded conformations of the side 
chains. In a case like this the observed vicinal- 
coupling constants are a weighted average of 
those from the participating conformers. 

2.3 Charge State 

An advantage of NMR over other structural 
techniques such as X-ray crystallography is 
that it has the potential to provide informa¬ 
tion not only on structure but also on the elec¬ 
tronic properties of molecules. Many drug 
leads contain ionizable groups and a determi¬ 
nation of their charge state in solution and/or 
at the bound site is important in the design of 
analogs. Simple plots of chemical shifts as a 


function of pH for nuclei near these ionizable 
groups provide a convenient way of determin¬ 
ing the pK a value and hence charge state. Thife 
is illustrated for ziconotide in Fig. 12.12, 
where it was suspected that one of the ioniz¬ 
able groups in the molecule, Asp 14 , may be 
involved in a stabilizing salt-bridge interac¬ 
tion (28). This was confirmed by noting that 
the p K & value for this residue is lowered con¬ 
siderably relative to the usual value for Asp. 
The /3-proton chemical shifts were essentially 
independent of pH over the range 3-7 (indicat¬ 
ing a P*a < 3), whereas those of a control, 
random-coil peptide, titrated as expected over 
this range. 

2.4 Tautomeric Equilibria 

Tautomerization is a relatively common fea¬ 
ture of drug molecules that is amenable to 
analysis through the use of chemical shifts or 
coupling constants as probes. This was re¬ 
cently demonstrated in a study of some non¬ 
peptide endothelin analogs (49).Startingfrom 
the modestly active compound (1) (Table 
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12.5), derived by screening a compound li¬ 
brary for ET a antagonists, the nanomolar in¬ 
hibitor (2) was developed. Further optimiza¬ 
tion through examination of electronic and 
structural requirements led to the subnano¬ 
molar inhibitor (3), which was subsequently 
put forward for evaluation in a number of pre- 
clinical disease models for stroke. 

These molecules display keto-enol tautomer- 
ization, as illustrated in the following struc¬ 
tures. The open form keto-acid salts and the 
closed form butenolides exist in a pH-dependent 
equilibrium in solution, and at physiological pH 
both forms exist. In principle, the biological ac¬ 
tivity could reside in either or both forms. 

The extent of tautomerization was estab¬ 
lished by evaluation of NMR spectra as a func¬ 


tion of pH, from 2.65-9.05 (49). At acidic pH, 
compound (2) exists essentially in the closed 
butenolide form. Because the pH is slowly 
raised by addition of NaOD, the spectrum be¬ 
gins to exhibit properties associated with the 
open form keto-acid, and at basic pH the com¬ 
pound is essentially all in the open form. The 
coupling pattern shown by the benzylic pro¬ 
tons is a particularly characteristic marker of 
the tautomeric process. At acidic pH the ben¬ 
zylic protons exhibit an AB quartet pattern 
consistent with the ring-closed structure. As 
the pH is raised this pattern coalesces to a 
singlet, broad at neutral pH and sharp at basic 
pH, as would be expected with the open form 
keto-acid structure. After the pH was basic, 
addition of DC1 to acidify the solution caused < 


Table 12.5 Substitution Pattern and Receptor-Binding Affinity of Nonpeptidic 
Endothelin Antagonists 3 



Compound 

R 1 

R 2 

et a 

ET b 

(1) PD012527 

Cl 

H 

430 

27000 

(2)PD155080 

och 3 

H 

>0.4 

4550 

(3) PD156707 

och 3 

3,4,5-OCH 3 

0.3 

780 

(4) 

och 3 

3,5-0CH 3 ,4-0(CH 2 ) 3 S0 3 Na 

0.38 

1600 


"From Refs. 49 and 50 
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the spectrum to return to its original appear¬ 
ance, consistent with a reversible tautomer- 
ization process. 

Identical biological results were obtained 
with the salt and closed butenolide form in all 
pharmacological assays, reflecting equilibra¬ 
tion at physiological pH. This made it difficult 
to identify the biologically active form from 
these experiments alone, although methyl- 
ation of the OH group in compounds 1-3 re¬ 
sulted in a loss of activity. Because these ana¬ 
logs cannot tautomerize to form open keto- 
acids, it seems likely that the open form is 
responsible for activity. 

In addition to its impact on the biologically 
active form, the tautomeric process has pro¬ 
found implications for formulation of drug 
candidates, as illustrated in some recent fur¬ 
ther development work on compound (3)(50). 
Although it is easy to synthesize and isolate 
water-soluble salts of the keto-acids, once they 
are placed in aqueous solution the tautomeric 
equilibrium determines how much of each 
form is present. Indeed, if the closed butenol¬ 
ide tautomer is sufficiently water insoluble, it 
can precipitate out of solution and the equilib¬ 
rium can drive the complete precipitation of 
the compound. Although (3) has good oral ac¬ 
tivity, its intravenous use is limited by the in¬ 
solubility of the closed-form butenolide tau¬ 
tomer without the use of a specific and 
complex buffered formulation. Thus in recent 
work a series of water-soluble butenolides was 
developed (50) to overcome this limitation for 
parenteral uses. This culminated in the devel¬ 
opment of (4) (Table 12.5), currently in pre- 
clinical evaluation. 

This description of the development of (4) 
provides a good illustration of the fact that the 
availability of an active molecule is not the end 
of the drug development pathway, and that 
formulation considerations can be critical. In 
this case NMR played a significant role in un¬ 
derstanding tautomeric processes that had a 
direct bearing on solubility and hence formu¬ 
lation. 

2.5 Ligand Dynamics: Line-Shape 
and Relaxation Data 

It is increasingly being recognized that the so¬ 
lution molecular dynamics of drugs may have 
an important role in modulating biological ac¬ 


tivity (48, 51-54). For example, dynamics may 
influence entropic contributions to the free 
energy of binding. In general the more flexible 
a ligand is, the more unfavorable will be the 
loss in entropy on binding, assuming a rela¬ 
tively rigid bound state of the ligand. How¬ 
ever, in some cases flexibility of a ligand may 
be a positive factor. This applies, for example, 
if a degree of flexibility is required to allow a 
ligand access to a buried active site, or if acti¬ 
vation of a receptor requires a conformational 
change mediated by ligand binding (9). There¬ 
fore, a knowledge of the flexibility of lead mol¬ 
ecules is an important supplement to the 
structural and electronic information avail¬ 
able from NMR. 

The two major NMR methods for obtaining 
information on ligand flexibility are line-shape 
analysis and relaxation measurements (usu¬ 
ally 13 C or 15 N T lt T 2 > or heteronuclear NOE 
measurements). In general terms, the former 
is sensitive to motions on the milli- to micro¬ 
second timescale and the latter to nanosecond 
timescales. To some extent, structure calcula¬ 
tions on peptide-based lead molecules can also 
give an indication of regions of flexibility from 
an examination of local regions of disorder 
among a family of calculated structures. Cau¬ 
tion must be exercised because other factors 
can contribute to disorder, although in many 
cases there is a connection between disorder in 
a structural ensemble and molecular flexibil¬ 
ity (55). A recent example concerns the solu¬ 
tion structures of three isomers of the a-cono- 
toxin GI (56). Attempts to increase structural 
diversity through the engineering of nonna¬ 
tive disulfide bonds showed that nonnative 
isomers were not only different in conforma¬ 
tion but were also considerably more flexible 
than the native isomer and had reduced activ¬ 
ity. 

In an example that illustrates the applica¬ 
tion of NMR relaxation measurements for 
studying ligand flexibility, Kessler and col¬ 
leagues (57) investigated the role of disulfide 
bonds in the a-amylase inhibitor tendamistat. 
This small protein contains two disulfide 
bonds (C11-C27 and C45-C73) and opening of 
the latter is known to reduce the melting tem¬ 
perature of the protein (i.e., reduce its stabil¬ 
ity), but in this case does not affect its a-amy¬ 
lase inhibitor function. The latter observation 
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Table 12.6 13 C-NMR Chemical Shift and Relaxation Data for Thyroxine 

Theoretical 


Two-State 

Experimental 21 Isotropic Motion Internal Motion 

Chemical Shift - - - 


Position 

(ppm) 

TA s) 

NOE 

T l (s) 

NOE 

T 1 (s) 

NOE 

C2',6' 

127.3 

0.63 

2.53 

0.63 

2.96 

0.63 

2.53 

C2,6 

142.6 

0.63 

2.63 

0.63 

2.96 

0.63 

2.58 

C-a 

57.1 

0.51 

2.37 

0.51 

2.94 

0.51 

2.57 

C-0 

36.4 

0.64 

2.29 

0.64 

2.96 

0.64 

2.51 
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Figure 12.13. Schematic illustrations cf motions of the outer ring of thyroxine. The dotted line 
through the outer ring shows the jump axis about which the ring rotates, (a) Ha is shown in the 
proximal position and is closer to the viewer than Hb because the torsion angle <f>' is greater than 0°. 
This conformation corresponds to one of the two states of the two-state jump model and agrees with 
the "twist" of the outer ring observed in the crystal structure, (b) Rotation about the dotted line 
through the center of the outer ring moves Ha away from the viewer and brings Hb toward the 
viewer. This corresponds to the second state of the outer ring in the two-state jump model, (c) Hb is 
now in the proximal position and closer to the viewer than Ha. (d) Hb is in the proximal position and , 
is now further from the viewer than Ha. Transition from a to b and from c to d involves small 
amplitude jumps on the nanosecond timescale and is detected by NMR relaxation measurements. 
Although not illustrated in the figure, the inner ring also exhibits this type of motion. Transitions a 
to c and b to d result in 180" flips of the outer ring and exchange of the environments of Ha and Hb. 
This ring flip occurs on a microsecond timescale and is detected by variable temperature line-shape 
studies. (Reprinted with permission from Ref. 52. Copyright 1996 American Chemical Society.) 


demonstrated the presence of additional 
larger amplitude, but slower ring flips (60). At 
low temperature two signals were seen for the 
H2' and H6' protons. These signals broadened 
with increasing temperature, then coalesced 
and sharpened as the temperature was further 
increased. This was attributed to exchange of 
the environments of the two protons brought 
about by 180" rotation of the "outer" ring of 
thyroxine. Substitution of the observed coales¬ 
cence temperature ( T c ) and the chemical shift 
difference of the two signals at low tempera¬ 
ture ( 8v ) allowed the free energy of activation 
for this slow ring flip process to be established 
from equation 12.1 (53, 60). 


AG* = 19.14T C [9.97 + log(7y8i,)] (12.1) 

The derived barriers for several thyroid hor¬ 
mones are in the range 36-38 kJ/mol, which 
corresponds to large-amplitude ring flips on 
the milli- to microsecond timescale. From a 
combination of the relaxation data and the dy¬ 
namic line-shape analysis data it was possible 
to propose a unified model that accounts for 
both the fast and slow internal motions, as 
summarized in Fig. 12.13. 

In this model, both aromatic rings of the 
thyroid hormones jump rapidly between two 
energetically equivalent conformations on a 
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nanosecond timescale (a<4> b and c <S> d in Fig. 
12.13). The half-angle of the jump varies, de¬ 
pending on the solvent, corresponding to an 
average displacement of about 90" between 
the two extreme jump positions. These sepa¬ 
rate states are not detectable on the chemical- 
shift timescale but lead to an average proximal 
environment for Ha and an average distal en¬ 
vironment for Hb (attributed to rapid inter¬ 
change between a and b in Fig. 12.13), which 
are seen in the low temperature spectra. How¬ 
ever, these fast motions are detected by relax¬ 
ation studies. Although the rate of this motion 
is rapid, its amplitude is not sufficient to aver¬ 
age the environment of proximal and distal 
protons. Occasionally (about once every 1000 
jumps) the outer ring jumps further than the 
nominal 90" range, exchanging the environ¬ 
ments of the proximal and distal protons (a<4> 
c and b d in Fig. 12.13). Although the actual 
rate of an individual ring flip is rapid, the ef¬ 
fective rate of the process is on the microsec¬ 
ond timescale, because on average a large 
number of small amplitude jumps occur for 
every large amplitude ring flip. It is the ex¬ 
changing of proximal and distal protons on the 
microsecond timescale that is detected by the 
variable-temperature line-shape studies. 

The fact that thyroxine is apparently able 
to so freely move over a moderately large re¬ 
gion of conformational space has implications 
for receptor binding. The crystal structure of 
the thyroid receptor ligand-binding domain 
complexed with the thyroid agonist 3,5-di- 
methyl-3'-isopropylthyronine (59) shows that 
the thyroid hormones bind at the center of the 
hydrophobic core of the ligand-binding do¬ 
main and may play a structural role in the 
conformational changes that activate the re¬ 
ceptor. The structures of the retinoid-X recep¬ 
tor ligand-binding domain (61) and the reti¬ 
noic acid-retinoic acid receptor ligand-binding 
domain complex (62) indicate that significant 
conformational changes accompany ligand 
binding in those cases. The conformational 
flexibility exhibited by the thyroid hormones 
may also be required for binding. It has been 
suggested that the rapid "wiggling" of the ar¬ 
omatic rings could enable the hormone to 
work its way to the center of the ligand-bind¬ 
ing domain as the protein reorders itself about 


the ligand and may in fact trigger receptor 
conformational changes (52). 

As briefly mentioned earlier, taxol provides 
another example where relaxation time mea¬ 
surements provide an insight into dynamics 
processes. Although it is generally thought 
that the taxane core is rigid, 13 C relaxation 
data suggest that a degree of flexibility (on the 
nanosecond timescale) may be present and 
may vary for different taxol analogs (48). In 
particular it appears that the removal of cer¬ 
tain side chains may introduce additional flex¬ 
ibility into the core region that would not eas¬ 
ily be predicted based on a simple inspection of 
the structure. 

Another example of the application of line- 
shape analysis to ligand dynamics is described 
in Section 3.2 for the drug trimetrexate when 
bound to dihydrofolate reductase (DHFR). 
From that example and earlier studies on 
DHFR (63-65), it is clear that the techniques 
described above can equally be applied to li¬ 
gands when bound to their receptor. In some 
cases significant but highly specific mobility 
appears to be present at the bound site. 

2.6 Pharmocophore Modeling: 

Conformations of a Set of Ligands 

Determination of the conformations of a range 
of ligands that all act at the same receptor site 
can provide significantly more information 
than just a single ligand structure. With a suf¬ 
ficiently broad range of ligands, it is often pos¬ 
sible to generate a pharmocophore model of 
the receptor site, deduced based on conserved 
structural features and the conformations of 
the ligands. This has been done recently, for 
example, for the o-conotoxins, the broad class 
of conotoxins to which Ziconotide (or MVIIA, 
mentioned above) belongs. From structural 
studies of a range of o-conotoxins and from 
literature data on various mutants with al¬ 
tered binding affinities, it was determined 
that only a localized region of the surface of 
these molecules is involved in receptor binding 
(66). This allowed a pharmocophore model of 
putative receptor-binding pockets to be devel¬ 
oped. 

The advantage of such a pharmocophore 
model is that smaller, nonpeptide molecules 
that might have improved stability and bio¬ 
availability over their peptidic counterparts 
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can, in principle, be designed. The NMR ap¬ 
proach used in such pharmocophore modeling 
often involves a combination of many of the 
techniques already described. By determining 
information about structure and electronic 
properties for a range of different ligands, all 
acting at the same receptor site, it is often 
possible to infer information about the bind¬ 
ing site, even if direct structural studies of this 
site are not possible. 

2.7 Limitations of Analog-Based Design 

Although a determination of the structure of 
bioactive molecules is of key importance, there 
are distinct limitations on the use of solution 
structures for drug design. In particular, un¬ 
less the molecule is rigid there is no certainty 
that the solution conformation is the same as 
the bioactive bound conformation. For this 
reason there has been a shift over recent years 
to approaches in which information about the 
bound state is obtained. The other approach 
has been to probe the bound conformation by 
making a range of constrained analogs of a 
flexible lead molecule, as illustrated earlier for 
endothelin. 

The most direct way of determining the 
conformation of a drug lead is to determine 
the full three-dimensional structure of its re¬ 
ceptor complex. This has now been achieved in 
a significant number of cases but represents a 
substantial undertaking, as described in later 
sections of this chapter. A simpler approach 
that has also been applied is to use transferred 
NOE methods, as described below. This ap¬ 
proach fits at the interface of ligand-based de¬ 
sign and receptor-based design. It fits with the 
former because no knowledge of the receptor 
structure is required, but it also fits with the 
latter because it requires the macromolecule 
of interest to be included in the mixture to be 
analyzed. It is appropriate therefore to intro¬ 
duce the topic here but also to discuss it fur¬ 
ther in Section 3. 

2.8 Conformation of Bound Ligands: 
Transferred NOEs 

In ligand-based drug design it is not necessary 
to know the structure of the receptor, or even 
the location of the binding site, although the 
conformation of the ligand bound to the recep¬ 


tor is crucial. It is clearly better if this can be 
measured directly rather then be inferred 
from the conformation of the free ligand. In 
certain circumstances this information on the 
bound conformation can be obtained from the 
transferred NOE (TrNOE) technique (67, 68). 
This method takes advantage of the fact that 
NOEs build up more rapidly in a ligand-mac- 
romolecule complex than they do in free 
ligand, and given appropriate exchange 
conditions for a mixture of ligand and macro¬ 
molecule (typically satisfied for K D >: 10“ 7 
M -1 ), then signals from a free ligand may be 
used to determine the bound conformation. 

The theory of the technique was reviewed 
previously (69)and recent developments that 
minimize potential artifacts from spin diffu¬ 
sion have been described (5). Because it is not 
necessary to monitor signals from the macro¬ 
molecule in this technique, it is usually 
present in substoichiometric amounts, thus 
requiring only minimal amounts of what is 
sometimes the more expensive component of 
ligand-macromoleculecomplexes. In addition, 
the molecular weight restrictions inherent in 
full 3D-structure determinations of complexes 
are ameliorated and the conformations of li¬ 
gands bound to very large macromolecules 
may be determined. For example, the tech¬ 
nique was recently used to determine the 
structure of an antibiotic bound to the ribo¬ 
some (70). A range of other applications in¬ 
cluding enzyme-substrate, protein-carbohy¬ 
drate, and protein-peptide interactions have 
recently been summarized (5). 

In addition to its application as a tool for 
determining bound conformations of ligands, 
the TrNOE method has also been used re¬ 
cently as a screening aid for the identification 
of ligands from mixtures th atbind to a protein 
of interest. This application is addressed in 
more detail later in this chapter. 

3 RECEPTOR-BASED DESIGN 

Receptor-based design refers to the process of 
determining the three-dimensional structure 
of a macromolecular target and using this in¬ 
formation to design ligands to interact with it. 
In general there have been few cases where 
the structure of a macromolecule or receptor 
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alone has been successfully used to design, de 
novo, a ligand to interact with that receptor. 
However, such an approach is likely to become 
more common with improved computer-based 
approaches to molecular design in the future 
(71).Currently, the most common approach is 
to study a ligand-macromolecule complex and 
to initiate the design process based on the in¬ 
teraction of the lead ligand with the macro¬ 
molecule. 

Although the structure of the macromole¬ 
cule alone is of less interest than that of the 
complex, in many cases a determination of the 
structure of the complex follows from earlier 
studies on the unbound macromolecule. It is 
thus useful to describe the approaches to 
structure determination of macromolecular 
targets. This is followed by a discussion of the 
dynamic aspects of protein structures in 
Section 3.2, before addressing the main topic 
of macromolecule-ligand interactions in Sec¬ 
tion 3.3. 

3.1 Macromolecular Structure 
Determination 

The two major techniques for determining 
three-dimensional structures of proteins or 
nucleic acids are X-ray crystallography and 
NMR spectroscopy. The crystallographic ap¬ 
proach to structure determination is de¬ 
scribed elsewhere of this volume and here the 
focus is on NMR. NMR has been used to deter¬ 
mine the structures of proteins only for about 
the last 15 years, with the first NMR structure 
determination being made in 1985. NMR has a 
number of advantages over X-ray crystallogra¬ 
phy, including the fact that the requirement 
that the protein needs to be crystallized is 
avoided, and that the dynamic information 
available from NMR studies complements the 
structural information. A major disadvantage 
of NMR spectroscopy, though, is that it is cur¬ 
rently limited to the determination of struc¬ 
tures of <35 kDa. With the development of 
new NMR techniques, such as TROSY (20), 
this seems certain to increase significantly 
over coming years, although the fact remains 
that among all structures currently deposited 
in the protein database the average size of 
NMR structures is about 8 kDa (72), substan¬ 
tially smaller than the average size of protein 
structures determined by X-ray crystallogra¬ 


phy. Despite this limitation, NMR has made 
major inroads into the macromolecular struc¬ 
ture determination process, and currently ap¬ 
proximately one-fifth of all new structures de¬ 
posited in the protein database have been 
determined by NMR spectroscopy. 

3.1.1 Overview of Approach. The basis for 
structure determination by NMR is that, by 
determining a large number of distance re¬ 
straints between pairs of protons, it is possible 
to reconstruct a three-dimensional image of 
the molecule. These distance restraints are de¬ 
rived primarily from nuclear Overhauser ef¬ 
fect (NOE) measurements, which detect dis¬ 
tances up to about 5 A. Over recent years such 
distance restraints have been supplemented 
by a range of other restraints, including dihe¬ 
dral angle restraints derived from coupling 
constant measurements and orientation re¬ 
straints derived from residual dipolar cou¬ 
plings. These restraints are input into a simu¬ 
lated annealing algorithm, which is used to 
calculate a family of structures consistent 
with the restraints. 

NMR is unique in that it can provide de¬ 
tailed and specific information on molecular 
dynamics in addition to structural informa¬ 
tion. The use of relaxation time measure¬ 
ments allows the relative mobility of individ¬ 
ual atomic positions within a macromolecule 
to be determined. The dynamic information 
obtained includes not only the rates or fre¬ 
quencies of internal motions but also their am¬ 
plitudes. Such amplitudes are often expressed 
by order parameters. Not surprisingly, it is 
observed in many cases that the termini of 
proteins are more flexible than internal re¬ 
gions. More interestingly, NMR has provided 
a number of examples where internal loops in 
proteins have been shown to have dynamics 
that may be associated with their function. A 
good example of this is HIV protease, where 
NMR studies have identified reduced-order 
parameters in the flap region of the molecule 
that may reflect flexibility to allow entry of 
substrates or inhibitors into the active site. 

In summary, a major strength of NMR is 
that a global picture not only of the structure 
but also of the dynamics of the macromolecu¬ 
lar target is obtained. Further, NMR provides 
information on ionization states of titratable 
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groups and other electronic features within 
macromolecules that may have an impact on 
ligand binding and function. 

3.1.2 Sample Requirements and Assignment 
Protocols. Structure determination by NMR 
typically requires 500 fxL of a 1-2 m M solution 
of the protein of interest. It is important that 
the macromolecule does not aggregate be¬ 
cause this causes spectral broadening and may 
preclude assignment. The sample should pref¬ 
erably be stable in solution over the extended 
period of time required to collect the range of 
NMR experiments (73-75) needed for assign¬ 
ment and structure determination. Individual 
experiments may last from a few hours to sev¬ 
eral days, with several weeks of data acquisi¬ 
tion required for studies of larger proteins. 

The particular set of NMR experiments re¬ 
quired for NMR structure determination de¬ 
pends on the size of the protein. It suffices to 
say that for smaller proteins (^7 kDa) it is 
usually possible to determine the structures, 
mainly using 2D NMR, without the need for 
isotopic labeling, by use of procedures de¬ 
scribed in detail above for Ziconotide. For pro¬ 
teins in the range 7-14 kDa, 15 N-labeling and 
a combination of 2D/3D NMR experiments is 
usually sufficient, whereas for larger proteins 
13 C/ 15 N labeling and 3D or 4D NMR is more or 
less mandatory. For proteins at the top end of 
the currently accessible range (25-35 kDa), 
there are additional advantages associated 
with partial deuteration of the protein. 

3.1.3 Recent Developments. A number of 
recently developed methods offer the potential 
for improving the quality of NMR structures 
and for increasing the size of proteins that can 
be examined. In particular, the use of residual 
dipolar couplings and of anisotropic contribu¬ 
tions to relaxation provide new kinds of re¬ 
straints that promise to lead to more accurate 
NMR structures (74, 76). As already men¬ 
tioned the TROSY method (20)exploits relax¬ 
ation phenomena to produce spectra with nar¬ 
row lines and promises to significantly expand 
the size of protein targets that can be exam¬ 
ined by NMR, from the current limit of about 
35 kDa to perhaps >100 kDa. 

Another development that is likely to have 
a significant impact is the increasing number 


of structural genomics programs being devel¬ 
oped. The demands arising from such pro¬ 
grams will no doubt stimulate new methods 
for the large-scale production of labeled pro¬ 
teins (77, 78), and for speeding up the rate of 
structure determination by both NMR and 
crystallography. 

3.1.4 Dynamics. Proteins exhibit a range 
of internal motions, from the millisecond to 
nanosecond timescale, and a full understand¬ 
ing of how small drugs might interact with 
such a "moving target" requires more than 
just the time-averaged macromolecular struc¬ 
ture. Thus, over recent years much effort has 
been directed toward defining motions within 
vroteins. 

a 

The most commonly applied approach has 
been to use 13 C or 15 N relaxation parameters 
such as T lt T 2 , and the heteronuclear NOE to 
derive correlation times for overall motion, to¬ 
gether with rates and amplitudes of internal 
motions (79). Although the precise interpreta¬ 
tion of the NMR relaxation data in terms of 
motional parameters remains dependent on 
the appropriateness of the motional model 
chosen, the results from many studies on the 
dynamics of proteins are sufficiently clear to 
confirm that nanosecond timescale motions in 
proteins are common. The functional signifi¬ 
cance of motions on the nanosecond timescale 
remains unclear and so far there have been 
few cases where significant differences in mo¬ 
tions on this timescale between ligand-free 
and ligand-bound forms of proteins have been 
measured. It will be interesting to assess the 
functional significance of such motions as 
more data become available. However, slower 
motions have been correlated with function in 
a number of proteins, with a good example 
being HIV protease, described in more detail 
in Section 4.2. 

Relaxation measurements require a con¬ 
siderable investment of syectrometer time 
and in some cases it may be possible to derive 
basic information about molecular dynamics 
from the structural ensemble alone. Although 
regions of disorder can reflect factors other 
than dynamics, a recent analysis (55) suggests 
that ill-defined regions in structural ensem¬ 
bles often do reflect slow, large-amplitude mo¬ 
tions. Even if relaxation measurements are 
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done, it is often not necessary to undertake 
extensive analysis to derive correlation times, 
given that trends are often apparent from the 
raw experimental data. For example, in the 
case of tendamistat,described above, it is clear 
directly from the heteronuclear NOE data 
that significant internal mobility is present at 
the N-terminus. 

3.1.5 Nucleic Acid Structures. Most of the 
discussion on macromolecular targets so far 
has focused on proteins. DNA represents an¬ 
other valuable target in drug design. Most 
studies in which DNA is the target are done 
using short model oligonucleotides to mimic 
the binding region of DNA. The regular re¬ 
peating nature of DNA structures makes this 
a more successful approach than similar at¬ 
tempts to dissect out binding regions of recep¬ 
tor proteins, where often the whole protein 
must be present to maintain a viable binding 
site. Similar comments apply for RNA, where 
improvements in synthetic methods have led 
to an increasing number of structure determi¬ 
nations over recent years. The principles in¬ 
volved in structure determination of nucleic 
acid targets are similar to those of proteins, 
but in practice nucleic acid structures are 
somewhat more difficult to solve. 

3.1.6 Challenges for the Future: Membrane- 
Bound Proteins. The majority of targets for 
currently known drugs are membrane-bound 
receptors, yet this represents the class of pro¬ 
teins for which least structural information is 
known. Membrane proteins are notoriously 
difficult to characterize at a structural level 
because they are difficult to crystallize, thus 
inhibiting X-ray crystallographic studies, and 
are both too large and too difficult to reconsti¬ 
tute in suitable media for NMR studies. Nev¬ 
ertheless, solid-state NMR methods are begin¬ 
ning to show promise that eventually such 
targets may be structurally characterized 
(80). Rotational resonance solid-state NMR 
measurements, for example, allow precise dis¬ 
tances to be measured in membrane-bound 
proteins (81). 

3.2 Macromolecule-Ligand Interactions 

Macromolecule-ligand interactions are inte¬ 
gral to a wide range of biological processes, 


including hormone, neurotransmitter or drug 
binding, antigen recognition, and enzyme- 
substrate interactions. Fundamental to each 
of these interactions is the recognition by a 
ligand of a unique binding site on the macro¬ 
molecule. Through an understanding of the 
specific interactions involved it may be possi¬ 
ble to design or discover analogous ligands 
with altered binding properties that might in¬ 
hibit the biochemical function of the macro¬ 
molecule in a highly specific manner. The 
study of macromolecule-ligand interactions 
thus forms the cornerstone of most structure- 
based drug design applications. The macro¬ 
molecule of interest may be a protein or a nu¬ 
cleic acid, although the majority of drug design 
applications have focused on protein-ligand 
interactions. For this reason we will refer 
mainly to protein-ligand interactions in the 
following discussion, but will include some ex¬ 
amples of drug-DNA interactions. 

3.2.1 Overview. There are several impor¬ 
tant aspects of macromolecule-ligand interac¬ 
tions that have a bearing on structure-based 
design. The simplest question that might be 
asked is "what is the strength of the binding 
interaction?," whereas the most detailed task 
would be to precisely define the atomic coordi¬ 
nates of the complete protein-ligand complex. , 
In between these extremes there are many 
other questions important to the drug design 
process; these include questions about the 
binding stoichiometry and kinetics, the con¬ 
formation of the bound ligand, and about the 
nature of functional group interactions be¬ 
tween the protein and bound ligand. These 
and other important questions were intro¬ 
duced briefly in Table 12.3 and are examined 
in more detail later in this section. Before do¬ 
ing this it is first necessary to consider NMR 
timescales because the ability of NMR meth¬ 
ods to answer questions about macromole- 
cule-ligand complexes depends critically on 
the kinetics of the binding interaction. Section 
3.2.2 thus describes how various NMR param¬ 
eters depend on binding kinetics and in partic¬ 
ular how fast- and slow-exchange conditions 
affect the interpretation of NMR data. 

Having identified the exchange regime, the 
task then becomes to decide which NMR pa¬ 
rameters can be used to answer the questions 
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Table 12.7 NMR Parameters and Their Changes on Binding 


Parameter 

Difference 

Typical Magnitude 11 

Chemical shift 

V L ~ V ML 

0-1000 s- 1 


— v MLm 

0-500 s- 1 

Coupling constant 

Jl — Jml 

0-12 s~ l 


Jm — 4ml 

0-12 s' 1 

Relaxation rate b 

- 1/Tml 

0-50 s 1 (for T lt larger for T 2 ) 


1/^M — l/^ML 

0-10 s _1 


"Ranges are approximate only and larger effects may be seen in some cases. 
6 1/T refers to either 1/T l or 1/T 2 . 


posed above about the complex. Many of the 
NMR parameters that were described earlier 
for deriving information about ligands are also 
applicable to studies of complexes. These in¬ 
clude chemical shifts, NOEs, and relaxation 
parameters. However, the presence of two in¬ 
teracting partners means that there are some 
differences in the way such parameters are 
measured and this has led to the development 
of several techniques that are particularly im¬ 
portant for the study of macromolecule-ligand 
interactions, including chemical-shift map¬ 
ping, isotope editing, and various NMR titra¬ 
tions. Section 3.2.3 describes these tech¬ 
niques. Finally, illustrative examples of the 
application of these techniques to specific drug 
design problems are given in Section 3.2.4. 

3.2.2 Influence of Kinetics and MVIR Time- 
scales. Macromolecule-ligand interactions are 
characterized by an equilibrium reaction that 
potentially has a wide range of affinities and 
rates: 


M + L <-> ML 

The rate constant for the forward reaction is 
referred to as the on rate (& on ), whereas disso¬ 
ciation of the complex is characterized by the 
reverse rate constant, k oif . The equilibrium 
constant for this interaction, represented in 
terms of the dissociation constant of the com¬ 
plex K d , reflects a balance of the on and off 
rates, as shown in Equation 12.2: 

= [M][L]/[ML] = kjk 0n (12.2) 

For many protein-ligand interactions k on is of 
the order of 10 8 M ~ 1 s"\ and is typically quite 
similar for different ligands. The observation 


that K u values may vary over a wide range, 
typically from millimolar to nanomolar (i.e., 
K» = 10“ 3 -10“ 9 M) for cases of interest, is a 
reflection of a variation in k off for different 
ligands. Consideration of the k on value above 
and the range of K B values noted suggests a 
range in & off from 10 -1 to 10 5 s~\ The lifetime 
of the bound complex (t MTj = 1 ik of£ ) may thus 
vary from much less than a millisecond to tens 
of seconds (10 -5 to 10 s based on the above off 
rates). The exchange rate for the second-order 
binding process is given by (82): 


k = 1/t = 1/t ml + 1/t l 
= k 0{{ (l + Pud Pi) 


(12.3) 


where p ML and are the mole fractions cf 
bound and free ligand, respectively. 

The appearance of an NMR spectrum of a 
protein-ligand complex is dependent on the 
rate of chemical interchange between free and 
bound states. In particular, the effects of ex¬ 
change on an individual NMR parameter (e.g., 
chemical shift, coupling constant, or relax¬ 
ation rate) depend on the relative magnitude 
of the exchange rate and the difference in the 
NMR parameter between the two states. The 
cases where the rate of interchange is greater 
than, about equal to, or less than, the param¬ 
eter difference are referred to as fast, interme¬ 
diate, and slow exchange, respectively, as in¬ 
dicated in Table 12.7. 

Table 12.7 shows that the changes in chem¬ 
ical shifts on ligand binding (for signals either 
from the ligand or from the macromolecule) 
are in general greater than those for coupling 
constants or relaxation rates. Given that 100 
s -1 might represent a typical exchange rate 
between free and bound states, it is clear that 
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Free Bound 



Intermediate 




Figure 12.14. Schematic illus¬ 
tration of the effects of slow, in¬ 
termediate, and fast exchange 
on the appearance of peaks in 
NMR spectra of macromolecule- 
ligand complexes. In the slow ex¬ 
change case separate peaks are 
seen for free and bound forms. 
Note the broader peak for the 
bound ligand because it now 
adopts the correlation time of . 
the macromolecule. In the fast 

exchange case only an averaged 
peak is observed. 


individual NMR signals may be found in either 
slow, fast, or intermediate exchange on the 
chemical-shift timescale, but it is more likely 
that couplings or relaxation parameters will 
be in fast exchange. Thus, in most cases where 
the term "NMR timescale" is used in the liter¬ 
ature, it refers to the chemical-shift timescale. 
The table also emphasizes that there are two 
types of signals that can be monitored, those 
from the ligand and those from the macromol¬ 
ecule. In general, the typical magnitude of 
changes to chemical shifts or couplings of ei¬ 
ther type of signal on binding are similar, al¬ 
though the changes to ligand signals may be 
larger than those from the macromolecule. 
However, changes to relaxation parameters 
for signals from ligands are much more likely 
to be greater than those for protein signals. 


This reflects the sensitivity of relaxation pa¬ 
rameters to molecular mobility: a ligand un¬ 
dergoes a greater relative change in mobility 
on binding than does a protein, given that the 
relative increase in molecular weight in the 
complex is much greater for the ligand than 
for the protein. 

The exchange regime (slow, intermediate, 
or fast) determines how a spectrum of a pro¬ 
tein-ligand mixture changes during a titra¬ 
tion, or as a function of temperature. Figure 
12.14 schematically illustrates the various ex¬ 
change regimes for macromolecule-ligand 
binding interactions. Slow exchange, corre¬ 
sponding to tight binding, is potentially the 
most useful regime, given that much detailed 
information on the nature of a complex can be 
deduced in this case. Nevertheless, fast ex- 
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change also allows valuable kinetic and ther¬ 
modynamic parameters to be derived. The 
analysis is more complex for intermediate ex¬ 
change and few quantitative studies are at¬ 
tempted for this situation. 

3.2.2.1 Slow Exchange. This situation ap¬ 
plies when the rate of exchange is much slower 
than the difference in chemical shifts between 
the two states (i.e., k « v B - u F ), where we 
now change to a nomenclature using sub¬ 
scripts to refer to the bound (B) and free (F) 
states. It should be understood that the sig¬ 
nals may derive from either ligand or macro¬ 
molecule, so although B is always related to 
the ML complex, F might refer to either free 
ligand (L) or free macromolecule (M). In this 
situation separate peaks are potentially ob¬ 
servable for both free and bound states at 
their respective chemical shifts. Whether such 
signals are actually observed depends on the 
mole ratio of the individual species in a titra¬ 
tion, and on the signals not being obscured by 
overlap or broadening. 

Addition of a ligand to a solution of a pro¬ 
tein results in the appearance of new signals 
attributed to bound protein resonances, with a 
concurrent decrease in the intensity of the 
free protein resonances, reflecting the de¬ 
creased proportion of free protein during the 
titration. Once a stoichiometric mole ratio is 
achieved (usually 1:1, but sometimes 2:1 or 
higher if multiple binding sites are present on 
the protein), peaks from free ligand appear 
with increasing intensity as the excess of free 
ligand increases. 

From such a titration it is possible to deter¬ 
mine the stoichiometry of the complex, to¬ 
gether with the chemical shifts of the bound 
states of the ligand and protein. In ID NMR 
spectra, overlap of peaks makes it difficult to 
monitor more than a few resonances from ei¬ 
ther species and such studies are most readily 
done when there is a well-resolved signal on 
one of the interacting species. Selective iso¬ 
tope labels have been used in the past for such 
studies but it is now more common to use uni¬ 
form 15 N- or 13 C-labeling of the protein and 
detect the chemical shifts in 2D HSQC spec¬ 
tra. It is often more difficult to label the ligand 
but in some cases the presence of rare nuclei 
such as 19 F can be used to advantage. A good 
example is the binding of the inhibitor 4-flu- 
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Figure 12.15. 19 F NMR spectra at 282 MHz of the 
4-fluorobenzenesulfonamide-carbonic-anhydrase-l 
system at various ratios of enzyme to inhibitor, as 
indicated on the traces. The peak at -6 ppm is 
caused by bound inhibitor. The enzyme concentra¬ 
tion was 1 m M at pH 7.2 in D 2 0 at 25°C. (Reprinted 
with permission from Ref. 83; © 1998, American 
Chemical Society.) 

orobenzenesulfonamide to the enzyme car¬ 
bonic anhydrase (83).Figure 12.15 shows 19 F 
spectra of the enzyme inhibitor complex at 
various mole ratios. The broadened peak for 
the bound ligand has a chemical shift of ap¬ 
proximately 6 ppm and is in slow exchange 
with the peak from free ligand at 0 ppm. The 
stoichiometry of the complex in this case is 
2:1, so that no signal from free ligand is visible 
until more than 2 moles of inhibitor are 
present. Addition of increasing amounts of li¬ 
gand results in an increase in the free ligand 
signal, but no change in the bound ligand 
signal. 

Determination of the binding constant 
from slow exchange spectra is not usually at¬ 
tempted. Generally for slow-exchange condi- 
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tions to exist in the first place, the binding is 
submicromolar in affinity and non-NMR 
methods are more suitable for determining af¬ 
finities in these cases. NMR studies are done 
at millimolar concentrations, making it diffi¬ 
cult to determine K u with any accuracy for 
tight binding systems. 

In principle, kinetic information on the 
complex can be obtained from slow-exchange 
spectra, as seen from the expressions for T, for 
tree and bound ligand signals: 

l/^2F,obs = 1/^2F + ^ offP b/P F (12.4) 

l/^B.obs = 1/^2B + ^off (12.5) 
Because the line width of a peak is related to T 2 

by 

LW - 1/itT 2 (12.6) 

then measurements of linewidth during a ti¬ 
tration can be used to derive & off . Equations 
12.4 and 12.5 show that, although the signal 
from bound ligand is independent of concen¬ 
tration, that of the free ligand decreases in 
linewidth as more ligand is added. A plot of 
linewidth vs. ligand/macromolecule mole ratio 
allows k off to be determined, as illustrated for 
example in a study of the 31 P linewidths for 
the 2'-phosphate of NADP + binding to dihy¬ 
drofolate reductase (84). 

Although the determination of off rates is 
cf significance in assessing the stability of the 
complex, the major interest in more recent 
studies of complexes in the slow-exchange 
limit has centered on determining the com¬ 
plete geometry of the complex through the use 
cf intra- and intermolecular NOEs. A recent, 
but already classic, example of this approach is 
illustrated by the binding of immunosuppres¬ 
sant peptides such as cyclosporin and FK506 
to their receptors. These types of examples are 
discussed in more detail in Section 3.2.4.2. 

3.2.2.2 Fast Exchange. When exchange be¬ 
tween free and bound states is very fast, ob¬ 
served NMR parameters are a simple 
weighted average of those from the two con¬ 
tributing states, illustrated by Equation 12.7 
for chemical shifts and Equation 12.8 for line- 
widths. 


I'obs = Pf^f + Pb* / b (12.7) 

1/^2,obs — Pf^2,F + Pb/T^.B (12.8) 

These equations show that, in the fast-ex- 
change limit, addition of a ligand to a protein 
solution will cause a progressive change in 
chemical shift. Signals from the protein ini¬ 
tially reflect the free state, but as ligand is 
added the population of bound protein in¬ 
creases and the observed signals move toward 
those of the bound state. Similarly, when li¬ 
gand signals are first detected they reflect pre¬ 
dominantly the bound state, but with increas¬ 
ing amounts of ligand they move toward the 
chemical shift of the free state. By regression 
analysis to Equation 12.7, taking into account 
the dependency of the mole fractions on K u by 
the standard quadratic binding equation, it is 
possible to obtain estimates of both K u and the 
bound shift. The procedure works best for 
rather weakly binding ligands (e.g., millimolar 
dissociation constants) (85). 

When exchange is somewhat slower, but 
still within the fast-exchange limit, there is an 
exchange contribution to linewidth, as shown 
in Equation 12.9: 

l/T^obs = PfITz? + Pb/^b (12 9) 

+ {[pBpF 2 4'JT 2 (v B - V F ) 2 ]/k 0 ff} 

In this case a maximum in the broadening of 
ligand or protein peaks occurs during the ti¬ 
tration at a mole ratio of approximately 0.3, as 
illustrated below. 

The spectral changes that occur in the fast- 
exchange regime can conveniently be illus¬ 
trated by studies on the binding of a series of 
terephthalamide ligands to an oligonucleotide 
model of DNA. The ligands, referred to as 
L(N0 2 ), L(NH 2 ), and L(Gly), were synthe¬ 
sized as precursors for potential anticancer 
agents (86). To establish whether they bind in 
the minor groove of AT-rich DNA, a series of 
NMR titration experiments was undertaken. 

Figure 12.16 shows an expansion of the al¬ 
iphatic region of a series of 1 H-NMR spectra of 
0.5 m M of the oligonucleotide d(GGTAAT- 
TACC)„ to which increasing amounts of 
L(NH 2 ) were added (86), of which the spectra 
cover mole ratios of ligand to DNA duplex 
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(6) L(NH 2 ) R = NH 2 

(7) L(N0 2 ) R = N0 2 

(8) L(GIy) R = NHCOCH 2 NH 2 


ranging from 0:1 to 2.6:1. Although the spec¬ 
tra are complicated by overlap in some re¬ 
gions, it is clear that addition of the ligand 
causes significant changes to the DNA peaks. 
A typical example is seen for the T6 methyl 
peak, for which addition of ligand causes both 
an upheld shift and broadening of the peak at 
certain stages of the titration. The chemical 
shift moves monotonically with ligand concen¬ 
tration up to a mole ratio of 1:1 and then 
reaches a plateau, remaining constant as 
larger amounts of ligand are added. Broaden¬ 
ing of the peak reaches a maximum at a ligand: 
DNA mole ratio of approximately 0.3. Both ob¬ 
servations are consistent with there being 
moderately fast exchange on the chemical- 
shift timescale between the free and ligand- 
bound forms of the DNA in solution. In this 
case, the observed spectral peaks reflect nei¬ 
ther the free nor the bound form of DNA, but 
are averaged signals. 

Ligand peaks are also in fast exchange, as 
seen with the L( NH,) methyl peak, which first 
appears at a ligand:DNA ratio of 1.36:1 as a 
shoulder on the overlapped T3 and T7 methyl 
peaks at approximately 1.27 ppm. This peak is 
not initially visible in spectra at low ligand: 
DNA mole ratios because of the small popula¬ 
tion of bound species and the overlapping 
DNA peaks. It moves upfield with increasing 
ligand concentration and, again, represents an 
averaged peak intermediate in chemical shift 
between free and bound forms, reflecting fast- 
exchange kinetics. Eventually, the chemical 
shift of this signal approaches that of the free 
ligand at 1.1 ppm, measured in a separate ex¬ 
periment with a solution of ligand alone. 


In the fast-exchange cases such as this it is 
possible to obtain an estimate of the dissocia¬ 
tion constant for the complex ( K B ) and the 
bound chemical shift (v B ) of DNA resonances 
by fitting the observed chemical shift changes 
as a function of ligand concentration to equa¬ 
tion 12.7 (85). The parameters that best fit the 
experimental data for the T6 methyl peak 
were K u = 1.2 X 10 -6 M and (v B - v F ) = 46 
Hz. Limitations on the accuracy of K D values 
derived in this way were described previously 
(85). 

To further define the thermodynamic con¬ 
stants associated with binding, the linewidth 
data were also quantitatively examined by use 
of Equations 12.6 and 12.9. In the case of mod¬ 
erately fast exchange, a maximum linewidth is 
predicted at a ligand:DNA mole ratio of 0.33 
(82, 85), and this was indeed observed in the 
current case. Derived binding parameters 
werei£ D < 1.0 X 10 -6 M, k off 250 s -1 , (v B - 
v F ) = 49 Hz, and LW B = 12 Hz, consistent 
with the values derived from the analysis of 
chemical shifts. Subsequent studies with the 
related ligand L( NO,) showed similar binding 
to L(NH 2 ). However, a third ligand, L(Gly), 
was found to bind somewhat more tightly, 
with some signals in the intermediate ex¬ 
change regime. 

3.2.2.3 Intermediate Exchange. In this re¬ 
gime the rate of exchange between bound and 
free states is comparable to the differences in 
NMR parameters associated with the ex¬ 
change. In general the spectral peaks often be¬ 
come very broad and analysis is difficult. This 
is the case, for example, for L(Gly). In the 
methyl region of the spectra shown in Fig. 
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Figure 12.16. Expanded regions from 300-MHz 1 H-NMR spectra for complexes between L(NH 2 ) 
and d(GGTAATTACC) 2 recorded at 10°C. The two small peaks at 1.12 and 1.14 ppm arise from an 
impurity. Increasing ligand concentration causes an upheld shift of the T6 methyl resonance (a), and 
causes the T7 and T 3 resonances to become overlapped at later stages of the titration (c). Peak (b) is 
an averaged resonance from the ligand methyl groups intermediate in shift between the bound and 
free forms of the ligand. (Reprinted with permission from Ref. 86.) 


12.17, the T7 CH 3 signal moves upheld and 
the T3 CH 3 signal moves slightly downfield 
with increasing ligand concentration, as seen 
previously for L(NH 2 ) and L(N0 2 ). However, 
in contrast to the case for the other ligands, 
the characteristic broadening of peaks at in¬ 
termediate ratios is non-Lorentzian, suggest¬ 
ing kinetics in the intermediate exchange re¬ 
gime. The T6 CH 3 peak does not shift in the 
characteristic fast-exchange manner but, in¬ 
stead, a new broad resonance appears close to 
the expected position of the bound T6 CH 3 
chemical shift on the first addition of ligand, 
and increases in intensity with increasing li¬ 
gand concentration. This observation is con¬ 
sistent with the ligand being in slow to inter¬ 


mediate exchange between the free and bound 
forms, with & off «=* (v B - v F ). Based on the mag¬ 
nitude of v B - v F for this resonance, k off for 
L(Gly) is estimated to be 50 s~\ which is sig¬ 
nificantly slower than that for L(N0 2 ) and 
L(NH 2 ). 

At a ligand:DNA ratio of approximately 1:1, 
the ratio of the integrals of the T6 methyl peak 
and the overlapped T3 and T7 methyl peaks is 
about 1:6. The expected value is 1:2, which 
indicates that the bound ligand methyl peak (4 
x CH 3 ) is overlapped with the T7 and T3 
methyl peaks, as observed with L(NH 2 ) and 
L(N0 2 ). When the ligand:DNA ratio is in¬ 
creased beyond a 1:1 ratio, a new peak appears 
at about 1.15 ppm and increases in intensity as 
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Figure 12.17. Expansions from the 600-MHz 1 H-NMR spectra for complexes formed between 
L(Gly) and d(GGTAATTACC) 2 showing the methyl resonances. The two small peaks at 1.12 and 1.14 
ppm are attributed to an impurity. The complex nature of the T6 methyl resonance at ligand:DNA 
ratios less than 1:1 (a), and the manner in which signal intensity increases at about 1.15 ppm at 
DNA:ligand ratios greater than 1:1 (b), are indicative of intermediate exchange. (Reprinted with 
permission from Ref. 86.) 


the ligand concentration is increased. This 
new peak corresponds to the methyl peak of 
the free ligand and its appearance in this man¬ 
ner is consistent with slow exchange on the 
chemical-shift timescale. To confirm this, 
spectra of a 2:1 mixture of L(Gly) and d(GG- 
TAATTACC), were acquired at different tem¬ 
peratures (86), as illustrated in Fig. 12.18. 

At low temperatures, signals at 1.15 and 
1.30 ppm (overlappedwith the T7 CH, and T3 
CH, peaks) attributable to the methyl groups 
from the free ligand and bound ligand, respec¬ 
tively, are distinguishable. As the tempera¬ 
ture is increased, a broad peak appears be¬ 
tween these two signals (at —1.22 ppm). At the 
lower temperatures ^off — (^b ~ v F ), so that 


methyl resonances of the ligand have complex 
characteristics reflecting slow-intermediate 
exchange. At higher temperatures, k„ > (v B 
- v F ), so the signal appears as a fast-ex¬ 
changed average between the free and bound 
resonances. From a qualitative analysis of the 
spectra, fe off for L(Gly) was estimated to be 
50-60 s _1 at 283 K. 

The fact that some peaks (e.g., the oligonu¬ 
cleotide T7 and T3 methyl signals) exhibit fast 
exchange, whereas others in the same spec¬ 
trum of the same complex exhibit slow-inter¬ 
mediate characteristics, is a reflection of the 
different (v B - v F ) values for different peaks. 
This emphasizes the point made earlier that 
the "exchangeregime" is a relative expression 
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Figure 12.18. Expansions of 'H- 
NMR spectra of a 2:1 mixture of 
L(Gly) and d(GGTAATTACC) 2 ac¬ 
quired at different temperatures. 
(Reprinted with permission from 
Ref. 86.) 


and depends not only on the rate of exchange, 
but also the size of the chemical shift differ¬ 
ences involved. In summary, the observations 
suggest that k off for binding of L(Gly) to the 
oligonucleotide duplex is much slower than 
that for the other two derivatives. This pro¬ 
vides an illustration of the value of NMR as a 
quick method for comparing the binding of dif¬ 
ferent ligands and for confirming ligand-bind¬ 
ing hypotheses. 

The change in binding kinetics may be ra¬ 
tionalized by considering the different struc¬ 
ture of the L(Gly) ligand relative to the other 
ligands. It was anticipated that, upon binding 
to the minor groove, the terephthalamides 
would adopt a conformation in which the sub¬ 
stituent on the central ring would form part of 
the convex edge of the ligands and therefore be 
directed toward the "mouth" of the groove. 
Given this binding arrangement, the ligand 
L(Gly) would have a positively charged alkyl- 
amine group positioned to interact with the 
negatively charged phosphate groups of the 
DNA backbone. The L(Gly) derivative also has 
a bulkier substituent than that of the other 
ligands and this is also consistent with some 
differences in its binding. 

3.2.3 NMR Techniques. NMR is a particu¬ 
larly versatile tool for the analysis of protein- 


ligand interactions. As well as being able to 
observe different nuclei, measurements may 
be made of a range of different NMR parame¬ 
ters, including chemical shifts, line widths, 
coupling constants, and relaxation parame¬ 
ters. In addition, there are several specific 
NMR techniques that have been applied for 
the measurement of these parameters. The, 
techniques that are particularly valuable for 
the study of macromolecule-ligand interac¬ 
tions are described in the following sections. 

3.2.3.1 Chemical-Shift Mapping. Chemical 
shifts are exquisitely sensitive markers of the 
local charge state and environment. Although 
it is not possible to construct an accurate 
model of a binding site from a knowledge of 
the chemical shifts of a bound ligand, a quali¬ 
tative interpretation of changes in chemical 
shifts of the macromolecule on binding pro¬ 
vides significant insight into the location of 
the binding site. Traditionally, such studies 
were done using ID NMR but are now increas¬ 
ingly done by 2D HSQC spectra. By simulta¬ 
neously obtaining information on chemical 
shifts for a large number of sites in a macro¬ 
molecule and seeing which ones change when 
a ligand binds, and which ones do not, it is 
possible to deduce the location of the binding 
site. This procedure is referred to as chemical- 
shift mapping. A prerequisite of the approach 


544 


NMR and Drug Discovery 



oo 

00 

CO 

S' 

eg 

00 

eg 

OO 

CO 

CO 

CO 

CO 

c\T 

00 

CO 

CO 

X, 

I 

X 

X 

X 

X 

X 

X 

X 

X, 

X 

X 

X 

I 

X 

I 

5 

c\f 

O 

O 

CO 


Tj- 

L?f 

L?f 

& 

ccf 

y 

hf 

00 

< 

oo" 

of 

o 

try 

1- 

< 

< 

< 

< 

f**' 

CO 

1 

1- 

■ 

1- 

< 

O 

O 


Figure 12.19. Chemical-shift perturbations of DNA protons upon ligand binding. The lighter and 
darker columns represent shifts attributed to L( NO,) and L(NH 2 ) derivatives, respectively. 


is that the chemical shifts have been assigned. 
Chemical-shift mapping by use of HSQC spec¬ 
tra is widely used in NMR screening ap¬ 
proaches and we will defer a more detailed dis¬ 
cussion on it until Section 4. 

The relative simplicity of how chemical 
shift information localizes binding sites may 
be illustrated by continuing with the example 
introduced above of terephthalamide binding 
to DNA. Figure 12.19 shows that, upon bind¬ 
ing of the terephthalamides L(NH 2 ) and 
L(N0 2 ) to d(GGTAATTACC) 2 , the DNA pro¬ 
tons on the four base pairs between A5 and A8 
are perturbed to a much larger degree than 
protons in the rest of the sequence. It is thus 
likely that these four residues form the bind¬ 
ing site. 

A more detailed analysis allows the binding 
site to be further localized to the minor, rather 
than to the major, groove in the region of these 
bases. A4, A5, and A8 are the only residues 
containing easily detectable minor groove pro¬ 
tons (H2). These resonances, which originate 
from the floor of the minor groove, are shifted 
downfield with ligand binding, whereas most 
other resonances are shifted upheld. This ob¬ 
servation is consistent with the ligands bind¬ 
ing in the minor groove and has been reported 
for other minor groove binders such as 
Hoechst 33258 (87) and SN-6999 (88, 89), 
where adenine H2 protons on the floor of the 
groove experience deshielding ring current ef¬ 


fects. However, significant chemical shift 
changes were also observed for some major 
groove protons. This illustrates the general 
point that sometimes allosteric effects can 
cause changes at sites not directly involved in 
binding. In the case of DNA, binding pertur¬ 
bations in the major groove have also been 
observed for other established minor groove 
binders such as distarnycin (90), netropsin 
(91), and Hoechst 33258 (92). Based on NOE 
and crystallographic data, it was concluded 
that the effects were caused by distortions of 
the B-DNA duplex, including changes in the 
"base roll" of residues within the binding site, 
upon complexation. Electronic effects arising 
from the close proximity of charged groups 
on the ligand to neighboring nucleotides 
were also found to perturb major groove 
resonances. 

In the case of the terephthalamides a com¬ 
parison of the minor and major groove pertur¬ 
bations for a particular residue shows that the 
minor groove protons are affected to a much 
greater extent. This is particularly evident for 
A8, where the H2 proton shifts by approxi¬ 
mately 0.25 ppm and the H8 proton is not af¬ 
fected (Fig. 12.19). It is difficult to conceive of 
a binding mode in the major groove that would 
account for such a large effect on the minor 
groove A8 H2 resonance without a simulta¬ 
neous effect on the major groove protons of T7 
and A8. The observed 1:1 stoichiometry of the 
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complex excludes the possibility that the li¬ 
gand binds to the major and minor groove at 
the same time. It is more likely, therefore, that 
binding in the minor groove causes distortion 
of the DNA structure so that perturbations 
are observed for the major groove protons of 
A5 and T6, but not neighboring nucleotides. 

Other examples of the use of chemical-shift 
mapping to locate binding sites have been 
made for ligands binding to a range of drug 
targets, including immunophilins, matrix 
metalloproteases, and DHFR. Some of these 
examples are described in more detail in sec¬ 
tion 3.2.5. 

3.2.3.2 NMR Titrations. There are a num¬ 
ber of advantages in undertaking a titration of 
ligand against macromolecule or vice versa 
rather than just examining the final complex. 
These include introducing the possibility of 
distinguishing signals from the individual 
components on the basis of intensities at in¬ 
termediate stages of the titration in the slow- 
exchange case and obtaining kinetic and ther¬ 
modynamic parameters associated with the 
interaction in the fast-exchange case. Such ti¬ 
trations may be done using either ID or 2D 
spectra and are very useful for establishing 
the exchange regime of the complex, as de¬ 
scribed in Section 2.1. A variety of parameters 
may be monitored in the titration, although 
the two most common are chemical shifts and 
linewidths. Examples of such titrations are 
given in Figs. 12.16 and 12.17. 

3.2.3.3 Isotope Editing and Filtering. Iso¬ 
tope editing provides a powerful way of distin¬ 
guishing between the components in a com¬ 
plex without the need for a titration. It is one 
of the most useful tools for the study of mac¬ 
romolecule-ligand complexes, and indeed the 
background NMR technology that underpins 
isotope editing was developed specifically for 
the study of complexes. The principle of the 
approach is illustrated in Fig. 12.20 and is 
based on the use of isotopes to select for sig¬ 
nals from either the ligand or macromolecule, 
or signals exclusively linking both of them. 

The conceptually simplest approach is to 
uniformly deuterate the macromolecule, 
thereby removing its signals from 1 H-detected 
NMR spectra, and allowing signals from only 
the ligand to be observed. This substantially 
simplifies the spectrum and allows, for exam¬ 


ple, the bound conformation of a ligand to be 
determined from NOESY data recorded in 
D 2 0. By rerunning the spectrum in H 2 0, ad¬ 
ditional NOEs to exchangeable amide protons 
on the protein may be detected, thereby pro¬ 
viding information on contacts between ligand 
and protein. Alternatively, 15 N or 13 C signals 
may be introduced selectively into either the 
ligand or protein and editing techniques used 
to select only signals attached to these labels 
and their proximate protons. This was used in 
the first example of an isotope-edited study, in 
this case to examine the binding of a 15 N-la- 
beled peptide-based inhibitor to pepsin (93). 

Potentially, the most useful approach in¬ 
volves uniform labeling of one of the compo¬ 
nents with either 16 N or 13 C and leaving the 
other component unlabeled. It is then possible 
to edit the spectrum by selecting for interac¬ 
tions (either through bond or through space) 
that connect protons that are both one-bond 
coupled to 15 N or 13 C. Alternatively, the spec¬ 
trum may be filtered to specifically remove 
such signals, thereby selecting only signals in¬ 
volving protons coupled to 14 N or 12 C (i.e., on 
the unlabeled component). It is generally eas¬ 
ier to uniformly label the protein rather than 
the ligand, and editing methods are highly ef¬ 
ficient, thus making it easy to visualize just 
the protein. However, because ligand signals , 
are often of interest, filtering experiments 
play a valuable role in visualizing them. Un¬ 
fortunately, filtering experiments are more 
susceptible to artifacts than are editing exper¬ 
iments, although there have been recent ad¬ 
vances in reducing artifacts (94). 

Another possibility is to use half-edited/ 
half-filtered 2D experiments to detect NOEs 
that specifically involve interactions between 
protons attached to 15 N or 13 C and those that 
are not. This approach is used, for example, to 
detect intermolecular NOEs between a la¬ 
beled protein and an unlabeled ligand. Exam¬ 
ples of isotope editing/filtering are given in 
section 3.2.4. 

3.2.3.4 NOE Docking. In many cases the 
study of a complex may follow a previous 
structure determination of the isolated macro¬ 
molecule and in that case it may be possible to 
determine much information about a complex 
by obtaining a relatively small number of 
NOEs linking the ligand and macromolecule. 
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Figure 12.20. Isotope editing and filtering can be used to select signals from either the ligand or the 
protein, (a) Normal protein and ligand with no filtering or editing, (b) Selection of the ligand signals 
by 2H labeling of the protein, (c) Selection of protein or ligand signals by 13 C and/or 15 N labeling/ 
editing, (d) Removal of protein or ligand signals by 13 C or 15 N filtering. 


Gradwell and Feeney (95) recently analyzed 
factors important in such NOE docking exper¬ 
iments. In their analysis, a high resolution X- 
ray structure of a protein-ligand complex was 
used to simulate loose distance restraints of 
varying degrees of quality that might typically 
be estimated from experimental NOE intensi¬ 
ties. These simulated data were used to exam¬ 
ine the effect of the number, distribution, and 
representation of the experimental con¬ 
straints on the precision and accuracy of the 
calculated structures. A standard simulated 
annealing protocol was used, as well as a more 
novel method based on rigid-body dynamics. 
The results showed some parallels with those 
from similar studies on complete protein NMR 


structure determinations, but it was found 
that more constraints per torsion angle are 
required to define docked structures of similar 
quality. This is because the conformation and 
orientation of the ligand are defined only by 
NOEs and not by covalent attachment, as is 
the case for amino acid side chains in a protein 
structure. The effectiveness of different NOE- 
constraint averaging methods was explored 
and the benefits of using “R -6 averaging" 
rather than "center averaging" with small 
sets of NOE constraints were demonstrated. 
With these considerations in mind it appears 
that NOE docking can be a very cost-efficient 
procedure for defining the environment, ori¬ 
entation, and conformation of ligands. 
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3.2.4 Selected Examples. Applications of 
the various NMR techniaues described are 
now illustrated with selected examples. The 
examples have been chosen to give a broad 
perspective on the types of NMR experiments 
that can be done and the types of information 
they provide. Specifically, the first example 
covers the case of drug-nucleic acid binding 
and focuses on more traditional NMR experi¬ 
ments, involving relatively standard homo- 
nuclear methods. The second example covers 
binding of moderately large ligands to immu- 
nophilins and highlights modern isotope edit¬ 
ing techniques. The third example, covering 
ligand binding to a matrix metalloproteinase, 
also highlights the importance of these tech¬ 
niques and shows how relatively simple spec¬ 
tra involving 19 F-containing ligands can be 
very informative. The fourth example de¬ 
scribes ligand binding to DHFR, one of the 
most extensively studied systems by NMR, 
and illustrates the derivation of a broad range 
of kinetic and geometric information on inter- 
molecular complexes. The final example, on 
HIV protease, describes how NMR comple¬ 
ments X-ray studies and provides information 
on dynamic motions within complexes. 

3,2.4. / DNA-Binding Drugs. The NMR ap¬ 
proaches that have been used to examine the 
interactions of minor groove binding drugs 
with DNA can be illustrated with studies 
on the bisbenzimidazole-based compound, 
Hoechst 33258, (9).It has been used widely as 
a fluorescent cytological DNA stain and is also 
active as an anthelmintic agent. It has activity 
against intraperitoneally implanted L1210 
and P388 leukemias in mice (96). 

Footprinting studies (96) have shown that 
sequences of four AT base pairs are a prereq¬ 
uisite for strong binding to DNA, consistent 
with similar observations for other structur¬ 


ally related molecules such as distamycin and 
netropsin (87, 90, 91, 97-99). The first struc¬ 
tural studies of Hoechst 33258 complexed to 
short sequences of synthetic oligonucleotides 
were done using X-ray crystallographic meth¬ 
ods (100-102). NMR and further X-ray stud¬ 
ies followed (92,103-107). Three of the X-ray 
studies (100, 101, 103) used the EcoRl 
sequence d(CGCGAATTCGCG) 2 and another 
(102) used the sequence d(CGCGATAT- 
CGCG),. Both sequences fulfil the require¬ 
ment of at least four consecutive AT base 
pairs, and the resulting complexes showed 
similar modes of binding. In all of the X-ray 
studies, the Hoechst ligand was found to bind 
to the minor groove. 

The NMR studies of complexes between 
Hoechst 33258 and oligonucleotide sequences 
provided complementary information to the, 
crystal structure data (92, 103-106). Because 
the binding is reversible, the NMR data offer 
the opportunity to derive information about 
the kinetics of the interaction. As with the 
crystallographic studies, the oligonucleotide 
sequences were designed to contain runs of AT 
base pairs. Some NMR studies were per¬ 
formed with dodecanucleotidesequences used 
in crystallographic studies, including d(CGC- 
GAATTCGCG),, which allowed a direct com¬ 
parison with the crystallographic data. Exper¬ 
iments were also performed with sequences 
specifically designed to investigate different 
aspects of the interaction. The sequence 
d(CTTTTGCAAAAG) 2 was designed to offer 
two binding sites, and it was shown that two 
Hoechst molecules interacted with the DNA 
duplex in symmetry-related orientations at 
the 5'-TTTT-3' and 5’-AAAA-3’ sites (92). 

3.2.4.1 .7 Stoichiometry and Kinetics. The 
starting point in studies of ligand-DNA com¬ 
plexes is usually a titration experiment to es- 
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Figure 12.21. ID 1 H-NMR spectra (recordedat 20°C) illustrating the thymine methyl region for the 
symmetrical ligand-free duplex (a)and for the 1:1 Hoechst:d(GGTAATTACC) 2 complex (b), which is 
no longer symmetrical because of the ligand binding, x corresponds to a small impurity peak. The 
DNA strands are numbered to the right of the spectra and the approximate location of the ligand is 
indicated by a black bar. (Reprinted with permission from Ref. 105. Copyright 1993, Blackwell 
Publishing Science.) 


tablish the nature and stoichiometry of the 

V 

complex. Complexes between the ligand and 
DNA duplex are obtained by adding small ali¬ 
quots of ligand solution to a sample of the 
DNA duplex with one-dimensional X H NMR 
spectra acquired after each addition. The ef¬ 
fects observed on the NMR spectrum after 
each addition reveal whether an interaction is 
taking place and allow the interaction to be 
characterized as fast or slow exchange on the 
NMR timescale. The stoichiometry of the in¬ 
teraction can also be determined from the ti¬ 
tration. 

In general, the addition of Hoechst 33258 
to the oligonucleotide duplexes causes a de¬ 
crease in the intensity of free DNA resonances 
and a concomitant increase in the intensity of 
new resonances, which appear in previously 
unoccupied spectral regions. This is consistent 
with the free and bound forms of the DNA 
duplex being in slow exchange with each 
other. For example, when Hoechst 33258 is 


added to d(GGTAATTACC) 2 , the free DNA 
signals completely disappear at a DNA:drug 
ratio of 1:1, and the number of new resonances 
is twice the number of previously observed 
free DNA resonances (Fig. 12.21). This is a 
common feature of complexes with 1:1 stoichi¬ 
ometry and reflects a loss of the dyad symme¬ 
try of the duplex attributed to ligand binding. 

Upon addition of Hoechst 33258 to 
d(CTTTTCGAAAAG) 2 , the free DNA signals 
completely disappeared at a ratio of 2:1 drug: 
DNA and there was no doubling of the number 
of DNA resonances in the spectrum (92).From 
this, it could be concluded that two molecules 
were bound per duplex in a manner that re¬ 
tained the dyad symmetry of the DNA duplex. 
The binding was also determined to be coop¬ 
erative, in that no intermediate 1:1 complex 
was detected (92). The formation of a 1:1 com¬ 
plex would have resulted in a very complicated 
spectrum at intermediate ligand:DNA ratios, 
given that resonances arising from the free 
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Figure 12.22. Aromatic re¬ 
gion of NOESY spectrum of 
a 1:1 mixture of Hoechst vs. 
d(CTTTTCGAAAAG) 2 re¬ 
corded with a 200-ms mixing 
time. Chemical exchange 
cross peaks between protons 
of the free DNA and the 2:1 
Hoechst:DNA complex are 
labeled with their identify¬ 
ing base pair. Below the di¬ 
agonal the H6 and H8 cross 
peaks are shown, whereas 
those of the adenine H2 res¬ 
onances are highlighted in 
the upper portion of the fig¬ 
ure and labeled with a sub¬ 
script 2. (Reprinted with 
permission from Ref. 92. 
Copyright 1990, Oxford 
University Press.) 


DNA, the 1:1 complex and the 2:1 complex, 
would have produced four times as many ob¬ 
servable peaks relative to the free DNA spe¬ 
cies. At intermediate ligand concentration, 
however, only two sets of peaks arising from 
DNA molecules were detected. In the 2:1 com¬ 
plex only four thymine methyl resonances 
were detected (1.0-1.5 ppm), as expected for a 
symmetrical DNA duplex. These are all over¬ 
lapped in the free DNA spectrum. In the 1:1 
mixture, only signals from free DNA and from 
the 2:1 complex were detected. 

The reversible nature of the Hoechst:DNA 
interaction is illustrated by the observation of 
chemical-exchange cross peaks in NOESY 
spectra of mixtures of free and complexed oli¬ 
gonucleotides (92, 104). This may be seen in 
the NOESY spectrum of a mixture of free and 
complexed d(CTTTTCGAAAAG) 2 , shown in 
Fig. 12.22, in which many chemical exchange 
cross peaks are observed between resonances 
arising from the free and bound oligonucleo¬ 
tide. In a NOESY spectrum acquired at lower 
temperature, the intensity of these chemical- 
exchange cross peaks is significantly reduced, 


indicating that the exchange is slowed at lower 
temperatures. The exchange rate was esti¬ 
mated to be <10 s” 1 at 10°C (92). 

The ability to observe such dynamic ex¬ 
change phenomena is one of the strengths of 
NMR relative to X-ray crystallography and 
several examples of these phenomena are de¬ 
scribed later in the chapter. 

3.2.4.1,2 Binding Site. A combination of 
chemical shift and NOE information can be 
used to locate and characterize binding sites. 
Chemical-shift differences between reso¬ 
nances arising from free and bound forms of 
DNA are indicative of the nature of the inter¬ 
action. In all studies of the Hoechst complexes 
described above (92, 104-107) significant 
changes to the chemical shifts of thymine HU 
protons and adenine H2 protons were ob¬ 
served, in contrast to the generally small per¬ 
turbations observed for the base H8/H6 and 
CH, resonances located in the major groove, 
perturbations of this nature are consistent 
with binding to the minor groove. In some in¬ 
stances, significant perturbations were ob¬ 
served to major groove protons located well 
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Figure 12.23. Schematic representation of ligand- 
induced ring-current effects on nucleotide protons 
that form the walls (deoxyriboseHl') and floor (ad¬ 
enine H2) of the minor groove. (+), shielding effects; 
(-), deshielding effects. (Reprinted with permission 
from Ref. 105. Copyright 1993. Blackwell Publish¬ 
ing Science.) 

within the binding site, reflecting changes in 
the conformation of the DNA duplex (e.g., 
base roll, propeller twisting) (92,106). 

Further evidence of minor groove binding 
is provided by the fact that resonances arising 
from protons on the floor of the groove, such as 
the adenine H2 and imino resonances, are 
shifted downfield, whereas resonances from 
protons on the minor groove walls, such as the 
HI' protons, are shifted upheld. This is a con¬ 
sequence of the ligand’s being inserted 
edge-on into the minor groove. The deoxyri- 
bose protons that form the walls of the minor 
groove are positioned above the rr-plane of the 
aromatic rings and consequently receive up¬ 
held perturbations to their chemical shifts. 
Protons positioned on the floor of the groove, 
however, generally lie in the plane of the aro¬ 
matic rings and experience downfield pertur¬ 
bations to their chemical shifts, as illustrated 
in Fig. 12.23. 

The magnitude of chemical shift changes is 
a strong indicator of the location of the bind¬ 


ing site. In the case of the 2:1 complex with 
d(CTTTTCGAAAAG) 2 (92), the largest chem¬ 
ical shift changes occur over the 5'-TTTT-3' 
and 5-AAAA-3' regions of the duplex. In the 
case of 1:1 complexes, where the DNA duplex 
contains an AT base-pair segment located at 
the center of the sequence, greater chemical 
shift perturbations are observed for reso¬ 
nances in that region (104-106), consistent 
with the binding site's being located there. 

Assignment of the bound ligand and DNA 
resonances enables the identification of inter- 
molecular NOEs, which are required for a pre¬ 
cise determination of the binding site. The 
interaction of Hoechst 33258 with the 
oligonucleotides produced a large number of 
intermolecular NOEs (-25-30), placing con¬ 
siderable constraints on the structure of the 
complex and enabling the orientation of the 
ligand within the binding site to be deter¬ 
mined. The NOE contacts observed for differ¬ 
ent complexes have a few features in common. 
The contacts generally involve DNA protons 
associated with the minor groove, such as ri- 
bose HI' and adenine H2, clearly locating 
Hoechst in the minor groove. Protons of all 
four spin systems of the ligand show NOEs to 
protons of the DNA, demonstrating that the 
interaction occurs along the entire length of 
the drug. Typically, protons along one edge of 
the ligand (e.g., NH and H4'/H4") exhibit close 
contacts to protons on the floor of the minor 
groove, showing that the bound drug is cres¬ 
cent shaped and isohelical with the DNA (92, 
104-106). 

Models of the interaction of Hoechst 33258 
with the oligonucleotides studied were gener¬ 
ated based on the intermolecular NOEs. The 
models of the 1:1 complexes indicated that the 
ligand interacted with the four AT base pairs 
located at the center of the sequence. Interest¬ 
ingly, there was no evidence for interactions 
with GC base pairs on the periphery of the 
binding sites. In the 2:1 complex reported by 
Searle and Embrey (92), the array of contacts 
observed located the ligand in the minor 
groove at the center of the 5'-TTTT-3' and 
5-AAAA-3' sites, as illustrated in Fig. 12.24. 

As well as defining the location of the bind¬ 
ing site, intermolecular NOEs can be used to 
determine the orientation of the ligand at that 
site. In the case of the 2:1 complex, the N - 
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Figure 12.24. Schematic representation of Hoechst 33258 bound to the minor groove of the 5'- 
I II I sequence, (a) Highlights of some of the NOEs that determine the position and orientation cf 
the Hoechst molecule within the minor groove, (b) Intermolecular hydrogen-bonding scheme. Mo¬ 
lecular-modeling studies with an idealized B-DNA helical structure indicate that the benzimidazole 
H3' is capable of forming bifurcated interactions with A11N3 and T302, whereas the benzimidazole 
H3" hydrogen bonds in a similar manner, but with A10N3 and T402. In the proposed model of the 
complex, these distances fall within 3.5 A and are thus within acceptable hydrogen-bonding limits. 
(Reprinted with permission from Ref. 92. Copyright 1990 Oxford University Press.) 


methylpiperizine moieties were found to point 
toward the center of the duplex, as indicated 
by NOEs between the protons from the piper- 
izine ring and the 5'-terminus of the adenine 
tract (Fig. 12.24). Corresponding NOEs were 
also observed between the drug phenolic pro¬ 
tons and the 5'-terminus of the thymine tract, 
as well as the 3'-terminus of the adenine tract 
cf the complementary strand. This model did 
not indicate any interaction with the central 
GC base pairs (92). 

The orientation of the ligand was similarly 
determined in the 1:1 complexes based on in¬ 
termolecular NOEs between protons located 
at the extremities of the Hoechst molecule and 
protons of the binding site. For example, in the 
interaction with d(GTGGAATTCCAC) 2> Fede 
et al. (106) reported NOEs between protons 
from the piperizine moiety and the H2 and 
HI' protons of the dinucleotide fragment 
d(A5T5)*d(A6T6). 

3.2,4,1.3 Dynamic Processes. The binding 
of the Hoechst molecule to the self-comple¬ 
mentary oligonucleotide duplexes in a 1:1 ra¬ 
tio lifts the dyad symmetry of the duplexes so 
that two sets of DNA resonances are observed. 
This indicates that the drug is in slow ex¬ 
change between the free and the bound forms. 
Close examination of the 2D NOE data, how¬ 
ever, reveals the presence of chemical-ex¬ 


change cross peaks between symmetry-related 
protons on opposite sides of the dyad axis of 
the DNA duplex. The mechanism by which 
this occurs has been described as dissociation 
of the Hoechst molecule from the duplex, fol¬ 
lowed by a 180" reorientation and rebinding 
(105,106). The self-complementary nature of 
the sequences ensures that the same complex 
is formed for either ligand orientation but 
with the net effect of interchanging the two 
strands with respect to the orientation of the 
Hoechst molecule. The rate at which this pro¬ 
cess occurs was estimated using cross-peak in¬ 
tensities in the NOESY spectrum (106). When 
interacting with d(GGTAATTACC) 2 and 
d(GTGGAATTCCAC) 2 , the lifetime of the 
complex in each state (l/& ex ) was reported to 
be approximately 0.8 and 0.45 s, respectively 
(105,106). These values indicate a small but 
significant difference in the affinity of Hoechst 
for TAATTA and GAATTC sites. 

Intramolecular dynamic processes that are 
fast on the NMR timescale are also observable 
in the 1 H-NMR spectrum of the bound 
Hoechst molecule. Resonance averaging is ob¬ 
served for the H2/H6 and H3/H5 protons of 
the phenol group, which is consistent with the 
environments on either side of the ring being 
averaged by rapid ring-flipping motions about 
the C4-C2' axis. This occurs despite the appar- 
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ent tight fit between the phenyl ring and the 
walls of the minor groove, which, in a static 
model of the complex, must present a large 
barrier for rotation. It was estimated (105) 
that the rate for this process is as high as 1000 
s 1 . This is much higher than the rate of in¬ 
terconversion between free and bound forms 
of the duplex; thus, dissociation of the drug 
from the complex cannot be the rate-limiting 
factor for phenol ring flipping. Dynamic fluc¬ 
tuations of the DNA conformation are more 
likely to provide the rate-limiting step. 

3.2.4.1.4 Summary of Solution Studies. The 
data obtained from these NMR studies are 
consistent with the bound ligand fitting 
tightly within the minor groove of AT tetram- 
ers, with the aromatic rings of the ligand being 
roughly coplanar. The AT tract provides the 
key recognition features required for binding, 
including the narrowness of the minor groove. 
The importance of van der Waals interactions 
is evident, given the large number of NOE con¬ 
tacts between the ligand and the walls and 
floor of the groove. Hydrogen bonding also 
plays a significant role in stabilizing the inter¬ 
action, as do electrostatic interactions be¬ 
tween the positively charged piperizine ring 
and the minor groove. Electrostatic interac¬ 
tions are also likely to play a significant role in 
orienting the ligand within the binding site, as 
shown in the 2:1 complex, where the pipera¬ 
zine rings point toward the center of the du¬ 
plex where the positive charge is best stabi¬ 
lized (92). The information derived from these 
studies, as well as from NMR studies of the 
interactions of other minor groove binders 
with DNA, is useful for the design of ligands 
with altered specificity or increased binding 
affinity, with the overall goal being the devel¬ 
opment of novel drugs. 

3.2.4.2 Immunophilins: Studies of FK506 
Analog Binding to FKBP. Some of the most de¬ 
tailed investigations of the interaction be¬ 
tween ligands and their target proteins have 
been made for the immunophilin class of pro¬ 
teins. The major FK506 binding protein 
(FKBP) has a molecular mass of about 11.8 
kDa, whereas cyclophilin (Cp) has a mass of 
about 17 kDa. These proteins are unrelated in 
amino acid sequence but both have peptidyl- 
prolyl cis-trans activities that are inhibited by 
immunosuppressants that block signal trans¬ 


duction pathways leading to T-lymphocyte ac¬ 
tivation. FK506 (10) and rapamycin (ll)in- 



(10) FK506 R = CH 2 CHCH 2 
(12) Ascomycin R = CH 2 CH 3 



hibit the cis-trans isomerase activity of FKBP, 
whereas cyclosporin A (structure shown in 
Fig. 12.3) inhibits that of Cp. NMR has con¬ 
tributed significantly to the understanding of 
binding interactions to both proteins. 

Initial studies on FK506 focused on the 
structure of the free ligand to aid in the design 
of further analogs (108-110).However, it was 
established from studies of the cyclosporin A- 
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Figure 12.25. Three-dimensional 
structure cf ascomycin bound to 
FKBP. Protons on the ligand that 
showed NOEs to the protein are de¬ 
noted by a black shading cf the car¬ 
bons to which they are attached. Al¬ 
though no NOEs were observed 
from the protons at position 3 to the 
protein, the upheld shift cf their res¬ 
onances, -1.09 and 0.25 ppm, sug¬ 
gests that they are in close proxim¬ 
ity to an aromatic region of FKBP. 
(Reprinted with permission from 
Ref. 116; © 1991, American Chemi¬ 
cal Society.) 


cyclophilin complex that the conformation of a 
molecule bound to its target site may be very 
different from that in the free state (111-113). 
In addition, analog design is assisted by know¬ 
ing the location of the binding region of the 
ligand. Studies were therefore undertaken to 
determine the bound state of the ligand as well 
as to identify those portions of the drug inter¬ 
acting with the binding protein. 

The first investigations involved the analy¬ 
sis of 13 C carbonyl chemical shifts of C8 and C9 
and the X H chemical shifts of the piperidine 
ring of FK506 bound to FKBP (114,115). The 
upheld shifts of the piperidine ring protons, as 
well as NOEs observed between these protons 
and aromatic protons of FKBP, suggested that 
the bound site on FKBP resided in an aro¬ 
matic-rich domain, and allowed a putative 
binding site on FKBP to be proposed. It was 
also evident that the pipecolinyl functionality 
of FK506 and analogs was involved in the 
binding face of the ligand. 


In another study (116), a uniformly Re¬ 
labeled ascomycin, ( 12 ), was prepared, allow¬ 
ing the bound conformation of ascomycin to be 
determined in the presence of FKBP. The en¬ 
hanced 13 C signals were used to edit the X H 
NOESY spectra used for the structural analy¬ 
sis. Not only were the assignments of side- 
chain methyls made possible by the 13 C en¬ 
richment, but ligand resonances could be 
distinguished readily from those of the pro¬ 
tein. The conformation of the ligand was de¬ 
termined from NOEs observed in a 3D 
HMQC-NOESY spectrum. The resulting asco¬ 
mycin structure (Fig. 12.25) differed consider¬ 
ably from that of the uncomplexed FK506 ob¬ 
tained by X-ray crystallography, but was 
similar to that of rapamycin. In particular, the 
bound ascomycin displayed a trans orientation 
of the 7,8-amide bond, whereas this bond is cis 
in free FK506 and trans in rapamycin. The 
backbone structure of the macrocyclicring dif¬ 
fered from that of uncomplexed FK506, but 
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showed a similarity in the piperidine ring re¬ 
gion to that of rapamycin. This study also 
showed that both the piperidine ring and the 
pyranoyl moiety of ascomycin are involved in 
the binding interface in the complex with 
FKBP. Ligand protons that show NOEs to the 
protein are in bold in Fig. 12.25. X-ray studies 
since have confirmed these results for both the 
FK506-FKBP and rapamycin-FKBP com¬ 
plexes, showing the trans orientation of the 
ligand amide bond in the bound conformation, 
and verifying the involvement of the piperi¬ 
dine and pyranoyl regions of the ligand in the 
binding interface (117). 

The binding site of the FKBP complex has 
also been investigated through use of NMR 
spectroscopy. Michnick et al. (118)and Moore 
et al. (119) solved the structure of uncom- 
plexed FKBP by use of 1 H-NMR methods. Al¬ 
though spectral overlap did not allow every 
structural constraint present to be identified 
unambiguously, convergent structures defin¬ 
ing the global fold of the 107-residue FKBP 
protein were obtained. Previous biochemical 
data allowed the extensive aromatic cluster 
within the core of the structure to be identified 
as the ligand-binding pocket. The loop regions 
of the protein between residues 37-43 and 83- 
90, situated at the open end of the binding 
pocket, were also of interest. The loops were 
the least well defined regions of FKBP and 
were thought to be flexible, and perhaps in¬ 
volved in the binding interaction. Examina¬ 
tion of 1 H and 15 N chemical-shift changes on 
addition of ligand supported this notion and 
suggested that significant structural changes 
in these loop regions occurred upon ligand 
binding (118). 

In a later study, a high resolution structure 
of the complete ascomycin-FKBP complex was 
calculated by heteronuclear 3D and 4D NMR 
by Meadows et al. (120). Uniformly labeled 
[ 15 N]FKBP and [ 13 C, 15 N]FKBP were pre¬ 
pared and incubated with unlabeled ascomy¬ 
cin to form the complexes. Three-dimensional 
NOESY-HSQC spectra, resolved according to 
16 N shifts, were used to obtain the NH-NH 
NOEs within FKBP. CH-NH NOEs were de- 
rivedfrom a4D PC/H^N/HJ-NOESY spec¬ 
trum of the doubly labeled material in H 2 0 


and CH-CH NOEs from the same experiment 
repeated in D 2 0. Hydrogen bond constraints 
were obtained by the identification of slowly 
exchanging amide protons from a series of 
HSQC spectra acquired over several days. 
Torsional angle constraints were obtained 
from coupling constants measured in a 2D 
HMQC-J spectrum of [U- 15 N]FKBP/ascomy- 
cin. In all, 1958 distance constraints were ap¬ 
plied to the structure calculation, with the ex¬ 
tra resolution afforded by isotopic labeling, as 
compared with the 590 and 1047 restraints 
used in earlier homonuclear studies (118, 
119). Restraints defining the structure cf 
bound ascomycin were obtained from the pre¬ 
viously reported data of Petros et al. (116)and, 
along with the intermolecular NOE-derived 
distance constraints also reported in their 
study, the complete ascomycin/FKBP solution 
structure was calculated. 

The extra detail afforded by the multi¬ 
dimensional NMR approach allowed the 
ligand-protein contact area to be located un¬ 
ambiguously and even specific intermolecular 
hydrogen bonds identified. The structure of 
the complexed FKBP was essentially similar 
to that of the uncomplexed structure, except 
that the "ill-defined" loop regions between 
residues 36-45 and 78-92 were found ,to 
adopt well-defined conformations in the com¬ 
plexed proteins, as preempted by previous 
studies. Although this difference may partially 
be a result of the differences in resolution 
achieved in the complexed and uncomplexed 
FKBP NMR studies, generally it was thought 
that binding involved some rearrangement of 
the 36-45 and 78-92 loops. This provides a 
good example of the dynamic nature of protein 
binding as revealed by NMR spectroscopy. 

The dynamic aspects of the ligand-FKBP 
complex formation were pursued by Cheng 
et al. (121) through analysis of 16 N-NMR re¬ 
laxation data. In particular, the increased 
backbone mobility for several residues 
within the 36-45 and 78-95 loops compared 
with that of the rest of the protein was 
noted. From analysis of the 15 N relaxation 
rates of FKBP complexed with FK506, it was 
found that flexibility was restricted along 
the entire polypeptide chain (122). This con¬ 
firmed the proposition that the binding in- 
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teraction of FKBP with ligand involves sta¬ 
bilization and structuring of the protein 
loops adjacent to the binding site. 

In summary, it was possible not only to de¬ 
tine the free and bound conformations of the 
ligand but also to identify the two binding in¬ 
terfaces involved in the interaction and dem¬ 
onstrate a reduction in protein mobility in a 
defined region of the protein upon binding. 
This level of analysis was possible because of 
the tight binding of the FKBP-ligand complex, 
its small size, and the availability of labeled 
species. The information proved to be comple¬ 
mentary to X-ray crystallographic studies and 
will help to clarify the role of FKBP complex 
formation in immunoregulation. 

3.2.43 Matrix Metalioproteinases. Matrix 
metalloproteinases (MMPs), including stro- 
melysin, collagenase, and gelatinase, are in¬ 
volved in tissue remodeling associated with 
embryonic development, growth, and wound 
healing. Unregulated or overexpressed MMPs 
have been implicated in several pathological 
conditions, including arthritis and cancer, and 
inhibitors of stromelysin and other MMPs 
have attracted much interest because of their 
potential for the treatment of these diseases. 

Several NMR structural studies of strome¬ 
lysin (123-127) and collagenase (128, 129) 
complexes have been reported. The secondary 
structure and global fold have been found to 
be quite similar for the catalytic domains of 
both enzymes and their various complexes 
with ligands. The active site in each enzyme is 
a cleft spanning the width of the enzyme, with 
a catalytic zinc atom coordinated by three his¬ 
tidine residues located in the center. Different 
dynamic properties of active-site residues in 
stromelysin/ligand complexes (3) and of colla¬ 
genase with and without bound inhibitor (128, 
129) have been reported. It has been proposed 
that structural and dynamic differences can 
be exploited in structure-based drug design to 
achieve broad inhibitor activity against sev¬ 
eral MMPs or to obtain more selective inhibi¬ 
tion (3). 

Of recent interest have been structural 
data on a novel class of MMP-binding inhibi¬ 
tors, represented by PNU-107859 (13) and 
PNU-142372 (14), which contain a thiadiazole 
moiety that coordinates the catalytic zinc 
atom through its exocyclic sulfur atom (130). 




Isotope editing/filtering studies played an 
important role in defining interactions be¬ 
tween the ligands and stromelysin. For exam¬ 
ple, for the stromelysin/PNU-107859 complex 
a 3D 12 C-filtered, 13 C-edited NOESY spectrum 
recorded on the [ 12 C, 14 N]PNU-107859/[ 13 C, 15 N]- 
stromelysin complex was used to assign pro¬ 
tein/ligand NOEs. Of the 11 observed NOEs 
between the ligand and protein aliphatic pro¬ 
tons, nine involved the aromatic ring of (13) 
and one involved the terminal methyl group. 
NOEs were observed between (13) and pro¬ 
tons of Tyr 155 , His 166 , Try 168 , and Ala 169 . All 
four of these residues are located in the Sj-Sg 
binding sites on one side of the active site. 
Comparison of 2D 1 H- 15 N HSQC spectra 
showed that differences between the 1 H and 
15 N chemical shifts for the stromelysin/13 and 
stromelysin/14 complexes are concentrated in 
the active site, indicating that no gross confor¬ 
mational differences in protein structure ex¬ 
ist. The aromatic rings of (13) and (14)bindin 
the same region of the protein. 
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Figure 12.26. Region of the ID 19 F spectrum of the stromelysin/PNU-142372 complex. Signals from 
free (sharp) and bound (broad) PNU-142372 are observed. (Adapted from Ref. 3 and reprinted with 
permission from Elsevier Science.) 


A region of the ID 19 F spectrum of the 
stromelysin/14 complex is shown in Figure 
12.26 (3). Two separate resonances were ob¬ 
served for the two ortho fluorine atoms of the 
bound ligand in contrast to the single reso¬ 
nance observed for both ortho protons of 
stromelysin-bound (13), indicating that the 
ring flip rate (rotation about the C P -C Y bond) 
is reduced for stromelysin-bound (14) com¬ 
pared to stromelysin-bound (13). A ring flip 
rate of approximately 100 s -1 was estimated 
from the difference in linewidths for the 
bound ortho and para fluorine atom reso¬ 
nances of (14), more than two orders of mag¬ 
nitude slower than the ring flip rate for (13). 
The 19 F spectrum in Figure 12.26 illustrates 
several general principles that are useful in 
NMR studies of ligand macromolecule com¬ 
plexes. First, note that the use of a rare probe 
nucleus such as 19 F produces spectra of ele¬ 
gant simplicity. Because there is no naturally 
occurring 19 F in the macromolecule, it gener¬ 
ates no interfering signals. Second, the offset 


in chemical shift between bound and free sig¬ 
nals reflects the different environment of the 
bound and free states. Third, signals from the 
bound ligand are broader than those from the 
free ligand because of the higher molecular 
weight of the complex but are still clearly vis¬ 
ible for a complex of this size. 

NMR studies have also been reported for 
ligands bound to collagenase. Interest so far 
has focused on hydroxamate-containing li¬ 
gands, where it has been shown that binding 
causes a decrease in mobility of some but not 
all active-site residues (128, 129). Interest¬ 
ingly, some active-site residues adjacent to 
residues that interact directly with inhibitor 
were found to have high mobility both in the 
presence and the absence of inhibitor (129) 
This contrasts with what is observed for 
stromelysin complexed to hydroxamate li¬ 
gands and a more complete understanding of 
the dynamics of the respective interactionk 
may provide critical information for drug de¬ 
sign (3). 
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Hydroxamate-containing ligands have also 
featuredin other NMR studies, this time using 
transferred NOEs to determine their bioactive 
conformations (131). TrNOE data were used 
to determine the conformation of the inhibi¬ 
tors when bound to stromelysin. The NOE- 
derived structures of the bound inhibitors 
were used as templates to screen a database of 
260,000 compounds. Eighteen of the 23 com¬ 
pounds identified for which stromelysin bind¬ 
ing data were available had affinities less than 
200 nM, demonstrating the value of deriving a 
conformationally restricted template for 
structure-based drug design (131).This study 
also demonstrates the close synergy that ex¬ 
ists between structure-based design and 
screening approaches, either in silico or exper¬ 
imental. 

32.4.4 Dihydrofolate Reductase, Dihydro¬ 
folate reductase (DHFR) is an important in¬ 


tracellular enzyme that is the target of several 
clinically used drugs, including methotrex¬ 
ate (15), an anticancer compound, and tri¬ 
methoprim (16), an antibacterial. These act by 
inhibiting the enzyme in malignant cells and 
parasites, respectively. The small size of 
DHFR (18-20 kDa) makes it amenable to 
structural studies and there have been numer¬ 
ous complexes determined using both X-ray 
and NMR methods. The focus here will be on a 
recent illustrative example of the structure of 
a new complex of DHFR with trimetrexate 
(17).Trimetrexate was initially investigated 
as an antimalarial agent but has subsequently 
been found to have antineoplastic activity 
against breast, neck, and head cancers. It has 
also been used as an antibacterial for the 
treatment of Pneumocystis carinii pneumonia 
in AIDS patients. As seen from the following 
structures, trimetrexate combines some of the 
features of trimethoprim and methotrexate: 



(16) 


(17) 
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Figure 12.27. Stereoview of a superposition over the backbone atoms (N, Ca, and C) of residues 
1-162 of the final 22 structures of the DHFR-trimetrexate complex. (a)View of the protein backbone 
and the trimetrexate heavy atoms, (b) View of trimetrexate in the binding site of enzyme, (c) Con¬ 
formation of trimetrexate in the binding site of enzyme. The orientation of trimetrexate is identical 
for (a)-(c) and only its heavy atoms are shown. (Reprinted with permission from Ref. 132. Copyright 
1999 The Protein Society.) 


The three-dimensional structure of the 
complex of DHFR with trimetrexate was de¬ 
termined using about 2000 distance re¬ 
straints, 300 angle restraints, and 100 hydro¬ 
gen-bonding restraints (132). Simulated 
annealing calculations produced a family of 22 
structures consistent with the constraints. 
Several intermolecular protein-ligand NOEs 
were obtained by using a novel approach that 
monitored temperature effects of NOE signals 
resulting from dynamic processes in the 
bound ligand. At low temperature (5°C) the 
trimethoxy ring of bound trimetrexate flips 


sufficiently slowly to give narrow signals i 
slow exchange, which give good NOE cro£ 
peaks. At higher temperature these broade 
and their NOE cross peaks disappear, thus a 
lowing the signals in the lower temperatui 
spectrum to be identified as NOEs involvin 
ligand protons. Figure 12.27 shows the strui 
ture of the complex, including the orientatio 
of the ligand in the binding site. 

The binding site for trimetrexate is well de¬ 
fined and was compared with the binding sites 
in related complexes formed with methotrex¬ 
ate and trimethoprim. No major conform^- 
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Figure 12.28. Correlated motions of 
a carboxylate group from methotrex¬ 
ate and Arg57 of DHFR detected by 
NMR. (a) Structure of an arginine-car- 
boxylate complex formed with sym¬ 
metrical end-on interactions and (b) 
structure of methotrexate showing its 
interaction with the guanidine group 
of Arg57 of DHFR. (Reprinted with 
permission from Ref. 133.) 


tional differences were detected between the 
different complexes. The 2,4-diaminopyrimi- 
dine-containing moieties in the three drugs 
bind essentially in the same binding pocket 
and the remaining parts of their molecules 
adapt their conformations such that they can 
make effective van der Waals interactions 
with essentially the same set of hydrophobic 
amino acids. The side-chain orientations and 
local conformations are not greatly changed in 
the different complexes. 

The ring flipping of the trimethoxy aro¬ 
matic ring mentioned above was detected by 
variable-temperature studies of the spectral 
line shape. The presence of such dynamics 
processes involvingthe ligand appear to be not 
uncommon in macromolecule-ligand com¬ 
plexes and the ability of NMR methods to de¬ 
tect such phenomena represents one distinct 
advantage of NMR over X-ray methods of 
structure determination. Relaxation measure¬ 


ments were also used to probe dynamics of the 
protein and no large amplitude motions were 
found, apart from that at the C-terminus 
(132). The power of NMR methods for study¬ 
ing dynamics of complexes is further illus¬ 
trated by an earlier study of the complex of 
DHFR with methotrexate (133). In this case a 
correlated dynamic rotation of a carboxylate 
group on the ligand and Arg 57 of the protein 
was detected, as illustrated in Fig. 12.28. 

3.2.4.5 HIV Protease. Because of its essen¬ 
tial role in the HIV life cycle, H N protease is a 
major target for structure-based design of 
anti-AIDS drugs. There are now more than 
100 structures of HIV protease and protease 
inhibitor complexes in the HIV-protease 
structure database (134-136) and the avail¬ 
ability of this wealth of high resolution struc¬ 
tural information has been the driving force 
behind numerous structure-based design pro¬ 
grams (134, 135, 137). Most of the high reso- 
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Figure 12.29. (a) View of the super¬ 
imposed heavy atom ( N, Ca, C) of the 
ensemble of structures of the HIV-1 
proteases/DMP323 complex, (b) Rib¬ 
bon diagram of the minimized aver¬ 
age structure of the complex. (Re¬ 
printed with permission from Ref. 
138. Copyright 1996 The Protein So¬ 
ciety.) 



lution structural information on HIV protease 
has been obtained from X-ray crystallography 
data (136). Although there are relatively few 
examples of HIV protease/inhibitor complexes 
that have been determined by use of NMR 
spectroscopy, the NMR data, taken together 
with the structural data from X-ray experi¬ 
ments, have contributed to an understanding 
of protease-inhibitor recognition and dynam¬ 
ics. Indeed, studies of HIV proteasefinhibitor 
complexes are a powerful example of the way 
in which complementary information ob¬ 
tained from X-ray crystallography and NMR 
spectroscopy can be used to facilitate struc¬ 
ture-based drug design. 

HIV protease/inhibitor complexes have a 
molecular weight of approximately 22 kDa. Al¬ 
though NMR spectroscopy is well suited to de¬ 
termination of the structure of molecules in 
this size range, efforts to determine the solu¬ 
tion structure of the complex were hampered 


by the fact that the protease undergoes rapid 
autocatalysis in solution. It required the de¬ 
velopment of potent inhibitors before NMIR 
studies of the complex became feasible. The 
first solution structure (Fig. 12.29) of HIV pro¬ 
tease bound to the cyclic urea inhibitor DMF- 
323 (18)was reported in 1996 (138). 



( 18 ) 
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The protease exists as a homodimer. Each 
99-residue monomer contains 10 /3-strands 
and the dimer is stabilized by a four-stranded 
antiparallel j3-sheet formed by the N- and C- 
terminal strands of each monomer. The active 
site of the enzyme is formed at the interface, 
where each monomer contributes a catalytic 
triad (Asp 25 -Thr 26 -Gly 27 ) that is responsible 
for cleavage of the protease substrates. The 
"flap region" is located above the reactive site 
and is formed by a hairpin from each monomer 
of two antiparallel /3-strands joined by a 
j3-turn. There is little difference between the 
solution and crystal structures of protease-in¬ 
hibitor complexes, except in those regions 
where the polypeptide chain is disordered. 
However, experiments in solution have al¬ 
lowed access to parameters that are not 
directly accessible from crystal data. These pa¬ 
rameters, such as the amplitude and fre¬ 
quency of backbone dynamics, the protona- 
tion states of the catalytic aspartate residues, 
and the rate of monomer interchange, are es¬ 
sential in understanding the interaction of 
HIV protease with potent inhibitors. 

The cyclic urea inhibitor DMP-323 was de¬ 
signed by analysis of crystal structures of HIV 
proteaselinhibitor complexes. A feature corn- 
men to many of the complexes of HIV protease 
is a buried water molecule that bridges the 
inhibitor and lie 50 in the flaps. Interactions 
with this water molecule are thought to in¬ 
duce the fit of the flaps over the inhibitor 
(139). In contrast, mammalian aspartic-pro- 
teaselinhibitor complexes are unable to ac¬ 
commodate an equivalent water molecule 
(135). This observation led to the design of a 
series of cyclic urea-based inhibitors that are 
capable of displacing the buried water mole¬ 
cule (139). As well as improving the specificity 
cf inhibitors to the viral protease, displace¬ 
ment of the water molecule was expected to 
increase the entropic contribution to inhibitor 
binding and thus enhance the affinity of com¬ 
plex formation. The cyclic urea inhibitors are 
highly potent and specific inhibitors of HIV 
protease (139) and for DMP-323 it has been 
shown in both the crystal structure (139) and 
in solution (140) that the urea moiety does 
indeed replace the buried water molecule. 

Although DMP-323 replaces one buried 
water molecule, several others are observed in 


the crystal structure of the complex. A more 
recent NMR study investigated the role of 
these water molecules to determine whether 
any had a structural role in the formation of 
the HIV protease/DMP-323 complex (141). In 
favorable cases, NMR can be used to estimate 
the residence times of hydration water mole¬ 
cules (142), thus providing information about 
the timescale of the interaction of buried wa¬ 
ter with the bulk solvent. This analysis led to 
the identification of a symmetry-related pair 
of water molecules that may have a structural 
role in formation of the complex. Such infor¬ 
mation may prove useful in the design of fu¬ 
ture cyclic urea inhibitors. An interesting 
finding in this study was the fact that each of 
the hydroxyl protons of DMP-323 is in rapid 
exchange with solvent. This is a surprising re¬ 
sult, given that two of these hydroxyl protons 
are completely buried and form a network of 
hydrogen bonds with the catalytic Asp z5 / 
Asp 125 side chains (143). Furthermore, the 
dissociation rate of DMP-323 is less than Is -1 
under the conditions of the experiment, which 
is too small to average the chemical shifts of 
the hydroxyl protons and the bulk water. The 
observation is ascribed to local fluctuations in 
the complex that allow solvent molecules to 
penetrate into the binding site. This conclu¬ 
sion is supported by the observation that the 
catalytic protons of the Asp 25 /Asp 125 side 
chains in the protease/DMP-323 complex un¬ 
dergo H-D exchange with solvent, even 
though they are buried and hydrogen bonded 
to the inhibitor (143). These studies highlight 
that even well-ordered structures such as the 
protease/DMP-323 complex may be flexible on 
the millisecond to microsecond timescale. 

Interestingly, in the DMP-323 complex, 
both of the catalytic Asp 25 /Asp 125 side chains 
are protonated over the pH range 2-7 (143). 
The protonated Asp 25 /Asp 125 residues form a 
network of hydrogen bonds with the hydroxyl 
groups of DMP-323. In contrast it has been 
shown that in the complex with the asymmet¬ 
ric inhibitor KNI-272, the side chain of Asp 25 
is protonated, whereas that of Asp 125 is not. A 
suggested explanation for this is that both 
oxygens of the Asp 125 side chain are deproto- 
nated to accept two hydrogen bonds, one from 
a bound water molecule and one from the in¬ 
hibitor. In contrast the side chain of Asp 25 is 
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protonated so that it can donate a hydrogen 
bond to the inhibitor (144). Consequently, 
the protonation state of the enzyme is influ¬ 
enced strongly by interaction with specific 
inhibitors and this knowledge is essential for a 
detailed understanding of the protease/drug 
interactions. 

NMR has also been used to study the rela¬ 
tionship between flexibility and enzymatic 
function for HIV protease. For the protease/ 
DMP-323 complex, 15 N spin-relaxation stud¬ 
ies determined that residues that are flexible 
correlate well with residues that are disor¬ 
dered in the NMR structure of the complex 
(145). For example, residues in poorly defined 
loops were found to undergo large-amplitude 
internal motions on the nanosecond-picosec¬ 
ond timescale. In contrast, two regions of the 
molecule were found to exhibit motions on the 
millisecond-microsecond timescale. The first 
of these is at the N-terminus of the protein 
around Thr 4 -Leu 5 . This is adjacent to the ma¬ 
jor site of autolysis of the protease and it has 
been suggested that the rate of cleavage may 
regulate HIV protease activity in vivo (146). 
Consequently, the observed flexibility may be 
important for regulation of protein function. 
The second region found to be undergoing mil¬ 
lisecond-microsecond motion was the tips of 
the flaps around Ile 50 -Gly 51 . In crystal struc¬ 
tures, this region of the protease is well or¬ 
dered and not involved in crystal contacts, al¬ 
though its conformation varies from structure 
to structure. This motion is interpreted as a 
dynamic conformational exchange process, 
which is fast relative to the chemical-shift 
timescale. Thus when the protease is bound to 
a symmetric inhibitor in solution, this confor¬ 
mational exchange results in the chemical 
shifts of the flap residues in the two monomers 
being identical (138, 145). In contrast, when 
the protease is bound to an asymmetric inhib¬ 
itor, such as KNI-272, crystal structures show 
that each monomer interacts with the inhibi¬ 
tor in a different way (144). This is reflected in 
the fact that the chemical shifts of the mono¬ 
mers are different when asymmetric inhibi¬ 
tors are bound (141,147). Analysis of spectra 
from such an asymmetric complex has re¬ 
vealed that the inhibitor is capable of "flip¬ 
ping" its orientation with respect to the two 
monomers without dissociating from the com¬ 


plex (148). These data again highlight the im¬ 
portance of defining both the structural and 
dynamic aspects of binding to understand the 
requirements for potent interactions between 
HIV protease and its inhibitors. 

The development of inhibitors of HIV pro¬ 
tease represents a major success for structure- 
based drug design. When HIV was first identi¬ 
fied in the early 1980s there were no known 
drugs effective for treatment of infection. A 
combination of X-ray crystallography, NMR 
spectroscopy, computer modeling, and chemi¬ 
cal synthesis has resulted in the development 
of several effective HIV protease inhibitors. 
However, in common with other retroviruses, 
HIV has a high transcription error rate that 
results in a rapid mutational rate. One of the 
results of this is the production of a divergent 
population of viruses in which the sequence of 
the HIV protease produced may differ sub¬ 
stantially (149, 150). As a consequence, drug- 
resistant strains of the virus emerge. Clearly, 
knowledge of the structural principles that 
govern inhibition of the protease and the 
mechanism by which the virus develops resis¬ 
tance will continue to be important in the de¬ 
velopment of effective new drugs. 

4 NMR SCREENING 

In the past, NMR was predominantly used in 
the design stage of drug discovery rather than 
the screening stages. Recently, new methods 
that make use of NMR to screen ligands for 
binding to a protein target have been devel¬ 
oped and are proving to be a powerful tool in 
the discovery of new drug leads. This section 
gives an overview of the various experimental 
methods, summarized in Table 12.8, which 
can be used to screen mixtures of ligands for 
binding to a drug target. There will also be a 
brief discussion on the practical consider¬ 
ations that need to be made when designing an 
NMR screening program. 

4.1 Methods 

4.1.1 Chemical-Shift Perturbation. Chemi¬ 
cal shift is a function of the chemical (and 
hence magnetic) environment that individual 
nuclei experience. Perturbations of chemical 
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Table 12.8 S umma r y of the Methods Available for NMR Screening and Their Respective Characteristics 


Screening 

Methodology 

Signals 

Observed 

Protein 
Size Limit 

Labeling 

Binding 

Information 

Obtained 

K d Limit 

Determined 

Suitable 
for HTS 

Mixture 

Deconvolution 

Required? 

Chemical shift 

Protein 

<30 kDa 

l6 N protein 

Location 

10 3 -10- 7 M 

y 

y 

y 

perturbation 

(e.g., SAR by 

NMR) 

STD 

Ligand 

None 

None 

Orientation 

10 3 -10~ 7 M 

X 

y 

X 

Diffusion-based 
(e.g., affinity 

Ligand 

None 

2 H protein for 
isotope editing 

None 

10~ 3 -10~ 7 M 

y 

y 

X 

NMR) 

Relaxation-based 

Ligand 

None 

None 

None 

10 3 -RT 7 M 

y 

y 

X 

trNOE 

Ligand 

None 

None 

Bound 

conformation 

10 _3 -10~ 7 M 

X 

y 

X 

NOE pumping 

Ligand or 
protein 
(reverse) 

None 

None 

Bound 

conformation 

10~ 3 -10~ 7 M 

X 

y 

X 

y° 

Spin labeling 

Ligand or 
protein 

None 

Spin label for 
either ligand or 
protein 

Orientation, 

simultaneous 

binding 

10~ 3 -10~ 6 M 

X 

y 6 

y 6 


"For reverse NOE pumping. 

6 For primary screening if the protein is spin-labeled or for second-site screening if the first-site ligand is spin-labeled. 
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Figure 12.30. Summary of the SAR by NMR drug discovery methodology. A protein target is 
screened against a library consisting of small organic molecules by use of the l H/ 15 N HSQC experi¬ 
ment. When two ligands that bind in close proximity are identified, they are linked to form a 
composite ligand with an increased affinity for the target. 


shifts can be used to detect binding of a ligand 
to a protein target. When a ligand binds to a 
protein the local chemical environment is 
changed, and this is reflected by a change in 
the chemical shifts of nuclei in close proximity 
to the ligand-binding site. The most common 
experiment used in this screening methodol¬ 
ogy is the H/ X HSQC that generates a dis¬ 
crete signal for each amide group within the 
protein. A reference 1 H/ 15 N HSQC spectrum, 
which is acquired in the absence of potential 
ligands, is compared to a spectrum recorded in 
the presence of ligands and any changes in the 
amide chemical shifts are indicative of a ligand 
binding to a location close to the correspond¬ 
ing amide groups. The major advantage of this 
technique is that, if the NMR assignment of 
the amide resonances is known, then the site 
of binding for each ligand can be determined. 


This is a valuable piece of information in tht 
development of more potent second-genera, 
tion drug leads. Binding affinities can also be 
determined by measuring the change in chem¬ 
ical shift as a function of ligand concentration 
One technique that utilizes this screening 
method for drug design is “SAR-by-NMR,” de¬ 
veloped by Fesik and coworkers (1,4, 151- 
155). SAR-by-NMR is a fragment-based dmg 
design approach in which a potent drug candi¬ 
date is derived by chemically linking two oi 
more small low affinity ligands for a target. Ir 
theory, the binding energy of the linked com¬ 
pounds will be the sum of the binding energies 
of the two individual compounds plus contri¬ 
butions to binding energy attributed to link, 
age. Therefore, it is possible to generate a dmg 
lead with a nanomolar dissociation constant 
(. K d ) from two milli- to micromolar fragments, 
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The first step in this process (Fig. 12.30) 
involves screening a library of ligands (typi¬ 
cally with a MW < 400) in mixtures of up to 10 
for binding to a protein target by comparing 
the 1 H/ 15 N HSQC spectrum of a 16 N-enriched 
protein in both the presence and the absence 
of ligands. Any ligand-induced changes in the 
chemical shift of the nitrogen and amide pro¬ 
ton signals indicate binding of one or more 
ligands in the mixture to the protein target. 
The mixture containing the binding ligand(s) 
is deconvoluted and each individual compound 
screened to identify the individual ligand(s) 
responsible for the observed chemical-shift 
perturbations. Once a binding ligand is iden¬ 
tified analogs can be screened to optimize 
binding. 

A second ligand, which binds at a proximal 
site, is then identified either from the original 
screen or by repeating the library screening 
with the first ligand site bound to the protein. 
This ligand is then optimized and the struc¬ 
ture of the ternary complex determined by use 
of either NMR or X-ray crystallography. The 
ternary complex structure provides informa¬ 
tion on the conformation and orientation of 
the bound ligands, which facilitates the syn¬ 
thesis of hybrid molecules where the two li¬ 
gands are joined by a suitable linking moiety. 

There are several examples that illustrate 
the potential of SAR by NMR. As noted earlier, 
FK506 binding protein (FKBP) inhibits cal- 
cineurin and blocks T-cell activation when 
complexed to the immunosuppressant FK506. 
This protein was used as a target for SAR by 


NMR screening and subsequently two ligands, 
(19) and ( 20 ), were identified with K& values 



och 3 

(19) 



( 20 ) 


of 2 fiM and 0.1 mM, respectively. A model of 
the ternary complex between the protein and 
both ligands was produced, which indicated 
that the methyl ester of (19) was close to the 
benzoyl hydroxyl group in (20). These two 
groups were linked with alkyl chains of vari¬ 
ous lengths, with the most active compound 



( 21 ) 
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(21) having a three-carbon linker and a K u 
value of 19 n M (151). 

Inhibitors of the matrix metalloproteinase 
(MMP) stromelysin have also been designed 
through the use of the SAR-by-NMR screen¬ 
ing methodology. As mentioned in section 
3.2.5.3, MMPs are involved in matrix degrada¬ 
tion and tissue remodeling, with overexpres¬ 
sion of these enzymes being associated with 
arthritis and tumor metastasis. Acetoxyhy- 
droxamate (22) was used as one ligand be- 


0 



( 22 ) 

cause it was known previously that MMP in¬ 
hibitors contain a hydroxamate moiety. The 
K b value of (22) was determined to be 17 m M. 
To identify a second ligand the protein was 
screened against a ligand library in the pres¬ 
ence of saturating amounts of (22). The li¬ 
brary was biased for hydrophobic compounds, 
given that stromelysin demonstrates a sub¬ 
strate preference for a hydrophobic amino ac¬ 
ids and structural studies had identified a hy¬ 
drophobic binding pocket supporting this 
observation. From the library screen a series 
of biphenyl compounds were identified and an¬ 
alogs of these compounds were synthesized. A 
biphenyl derivative (23) was produced with a 


NC- 


\ / \ / 


OH 


(23) 

K b value of 0.02 m M. The NMR structure of a 
ternary complex, consisting of stromelysin 

(22) and the biaryl derivative (24) (chosenfor 
its superior aqueous solubility), was deter¬ 
mined and indicated that the methyl group of 

(22) was in close proximity to the pyrimidine 
ring of (24). With this information (22) and 

(23) were subsequently linked by different 




length linkers and the most active compound 
produced, (25), had a K n value of 15 n M (154). 

A variation of SAR-by-NMR is to optimize 
binding or improve the pharmacological prop¬ 
erties of known drug leads generated by other 
methods (e.g., natural products isolation or 
combinatorial chemistry). A compound can be 
fragmented into individual subunits and then 
alternative fragments identified through use 
of 1 H/ 15 N HSQC screening. These fragments 
can then be incorporated into the molecular 
structure in the hope of improving the binding 
and/or pharmacological properties of the par¬ 
ent compound (Fig. 12.31). The alternative 
fragment must bind in the same location 
as the corresponding section of the original 
molecule, making 1 H/ 15 N HSQC screening 
method ideal as it provides information on the 
binding site of ligands. 

In a demonstration of this fragmentation 
method, an antagonist of the interaction be¬ 
tween leukocyte function-associated antigen 1 
(LFA-1) and intracellular adhesion molecule 1 
(ICAM-1) was used as a starting molecule. 
This interaction plays a role in the inflamma¬ 
tory response and specific T-cell immune re¬ 
sponses, and inhibitors have applications in 
the treatment of inflammation and organ 
transplant rejection. The p-arylthio cin- 
namide antagonist (26) had an IC„ value of 44 
n M; however, it was envisaged that the mole¬ 
cule's activity and physical properties could be 
improved by replacing the isopropyl phenyl 
group with a more hydrophilic moiety. Screen¬ 
ing of a 2500-compound library provided sev¬ 
eral hits, and analogs of (26) were made that 
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Incorporate alternative 
fragment into original lead 



Figure 12.31. The fragment optimization ap¬ 
proach developed from SAR. by NMR. A known li¬ 
gand of a protein is broken into fragments and small 
molecules based on the fragments are screened for 
binding. Any molecules that are found to bind can 
then be incorporated into the original lead com¬ 
pound with the hope of improving its binding and/or 
physicochemical properties. 



incorporated these ligands in place of the iso¬ 
propyl phenyl group. Compounds (27) and 
(28)hiad both improved aqueous solubility and 
pharmiacokinetic profiles, with similar or im- 




(28) 

proved activity (IC 50 values of 20 and 40 nM, 
respectively) when compared to that of the 
parent compound (26) (156). 

Many compounds bind to human serum al¬ 
bumin (HSA), which significantly reduces 
their in vivo activity and hence their potential 
as a drug lead. The fragmentation method has 
recently been used to find analogs of diflusinal 
(29) that have a reduced affinity toward HSA 



(29) 

(157). Diflusinal is a nonsteroidal anti-inflam¬ 
matory that is more potent, longer acting, and 
is more tolerated in vivo than aspirin. How¬ 
ever, 99% of diflusinal is bound to albumin in 
plasma and, as a result, high doses are re¬ 
quired for it to be effective. Structural studies 
of the diflusinal/HSA-III complex indicated 
that, by introducing polar functionality to the 
difluorophenyl moiety, binding affinity to 
HAS may be reduced without affecting activ¬ 
ity. A series of organic compounds analogous 
to the difluorophenyl fragment were screened 
using the H/ 15 N HSQC chemical-shift pertur¬ 
bation method and several alternative frag¬ 
ments were identified. These were incorpo¬ 
rated into the diflusinal structure, resulting in 
a number of analogs (e.g., 30 and 31) with 
reduced affinity for HSA but still maintaining 
some activity. It was predicted that the next 
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OH 0 



(30) 


(31) 

generation of compounds would incorporate 
the fluorine atoms present in (29) into leads 
such as (30) and (31) because the presence of 
fluorine increased activity without increasing 
the affinity for HSA. 

Overall, SAR-by-NMR is already showing 
great promise as a major new tool in drug de¬ 
sign, but it has a few limitations. First, the 
screening of large ligand libraries and any sub¬ 
sequent deconvolution steps can be expensive, 
in that a large amount of 15 N-labeled protein 
is needed. However, as already noted, recent 
advances in cryoprobe technology have re¬ 
duced the amount of protein required. The use 
of smaller, more druglike libraries such as that 
described in the SHAPES ideology (7) could 
also be used to reduce the amount of protein 
required for screening. The second limitation 
is that the NMR assignments for the protein 
must be known or determined. This limits the 
size of the protein target to 30 kDa or less, 
although this value will presumably increase 


as TROSY-based NMR technologies for struc¬ 
ture determination advance. The method also 
requires the comparison of 1 H/ 15 N HSQC of 
the protein in both the absence and the pres¬ 
ence of ligands. Changes in solvent conditions 
such as pH, polarity, salt concentration, and 
viscosity may cause shifts in amide reso¬ 
nances, leading to false positives (158). 

Recently, a protocol was described that 
screens ligands using the 1 H/ 15 N HSQC exper¬ 
iment but includes a mass spectrometry pre¬ 
screening step. Ligand mixtures are added to a 
protein target and then subjected to size-ex¬ 
clusion chromatography, which separates li¬ 
gand/protein complexes from free ligands. The 
identity of the bound ligands can then be de¬ 
termined using MS data and, once identified, 
screened by 1 H/ 15 N HSQC to determine the 
location and specificity of binding (159). This 
MS/NMR methodology reduces the amount of 
15 N-labeled protein required because only a 
fraction of the library is screened by NMR and 
there is no deconvolution step. 

4.1.2 Magnetization Transfer Experiments. 

Proteins are composed of a large network of 
dipole-dipole interactions, resulting in the ef¬ 
ficient transfer of magnetization throughout 
the molecule. The saturation transfer differ¬ 
ence (STD) experiment (Fig. 12.32) uses tfris 
phenomenon to detect the binding of ligands 
to a protein. It relies on the fact that satura¬ 
tion of a single protein resonance results in 
saturation of all protein resonances and any 
ligands that bind to the protein, provided they 
are not affected directly by the selective satu¬ 
ration pulse (160). The STD experiment is 
able to detect the binding of ligands with a K D 
between 10~ 3 and 10”' 8 M. 

An STD experiment consists of irradiating 
an isolated protein resonance (either at low or 
high field) with a series of pulses that saturate 
the entire protein and any binding ligands. 
This results in a spectrum containing reduced 
signal intensities from both the protein and 
the ligands that bind to it. A second spectrum 
of the protein and ligand library is then re¬ 
corded with the saturation pulses off-reso- 
nance. Subtraction of these two spectra re¬ 
sults in the STD spectrum that shows only 
those ligands that bind to the protein (residual 
protein resonances are removed using a T 2 re- 
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Figure 12.32. A schematic representation of the saturation transfer effect. The protein resonances 
are saturated (indicated by shading) by a selective pulse by spin diffusion. Resonances of nonbinding 
ligands (triangles) are not affected by this pulse but ligands that are interacting with the protein 
(ellipses)will also become saturated. These interacting ligands are transferred to solution through 
chemical exchange where they are detected. (Reprinted with permission from Ref. 16. Copyright 
1999, Wiley-VCH.) 


laxation filter). This subtraction occurs inter¬ 
nally through phase cycling after every scan, 
to reduce artifacts attributed to temperature 
or magnetic field variations (160,161). 

An STD can be added to most forms of 
UMR experiments including COSY, TOCSY, 
MOESY, and inversely detected 13 C or 15 N 
spectra. A high resolution magic angle spin¬ 
ning (HR-MAS)STD experiment has been de¬ 
veloped to study the binding of ligands to a 
protein immobilized on a solid support. HR- 
MAS STD NMR provides a way of obtaining 
ligand-binding information for proteins that 
are difficult to work with in solution attrib¬ 
uted to either poor solubility or conforma¬ 
tional changes (161). In addition to screening 
ligands for binding to a protein (160,162,163), 
the binding epitope of a molecule can also be 
determined by examining the intensities of li¬ 
gand resonances (164-166). The proton sig¬ 
nals having the strongest signals will corre¬ 
spond to those that are part of the ligand's 
binding epitope. For example, it was shown 
that when methyl /3-D-galactoside (32) bound 
to Ricinus communis agglutin / the H2, H3, 
and H4 were saturated to the highest degree 
(values on structure indicate relative signal 
strengths) and hence were in close proximity 
to the protein protons. This analysis was sub¬ 
sequently extended to the decasaccharide NA, 
(33) and demonstrated that the Gal-6' and 


63%H 



100 % 42% 

(32) 


GlcNAc-5' residues bind edge-on to the pro¬ 
tein, with the binding contribution of the ter¬ 
minal galactose residue being the greater 
(165). 

STD NMR studies have also been per¬ 
formed on membrane-bound receptors by em¬ 
bedding the protein in the phospholipid bi¬ 
layer of a liposome (166). 

There are a number of advantages in using 
the STD experiment for the detection of ligand 
binding. The saturation transfer effect is an 
efficient process, which results in high sensi¬ 
tivity, and hence only small quantities of pro¬ 
tein are required (nanomolar concentrations 
of a protein with MW > lOkDa) (160,165). In 
addition, protein size is noncritical; in fact, as 
the protein becomes larger, the saturation 
transfer effect becomes more efficient. The ac¬ 
quisition time for each experiment is also 
quite short and, because the experiment is li¬ 
gand observed, no deconvolution of mixtures 
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OH 


(33) 


is required, making this a good technique for 
high throughput screening of large ligand li¬ 
braries. Unlike the chemical-shift perturba¬ 
tion techniques, STD experiments provide no 
information on the site of ligand binding. 

A second variation of saturation transfer 
experiment has been devised by Dalvit and co¬ 
workers that uses the transfer of magnetiza¬ 
tion from the water (167). Water is intimately 
associated with proteins being bound either 
within or on the surface of the macromolecu- 
lar structure. Saturation of the water reso¬ 
nance will lead to protein saturation through a 
variety of mechanisms, including saturation 
of the aH resonances, saturation of exchang¬ 
ing protein resonances, and NOE interactions 
between water and the protein. If a compound 
is bound to the protein it will also become sat¬ 
urated, and this effect can be used as an indi¬ 
cation of ligand binding (167). 

4.1.3 Molecular Diffusion. Molecules can 
be distinguished based on their diffusion coef¬ 
ficients, which are related to molecular size. 
Large macromolecules, such as proteins, dif¬ 
fuse more slowly than small molecules and it is 
this size difference that can be exploited to 
screen for ligand binding. If a small molecule 
binds to a protein target its diffusion coeffi¬ 
cient is altered to a value more like that of the 
protein. Therefore, by utilizing a diffusion fil¬ 


ter, resonances generated by small molecules 
that do not bind to the protein can be removed 
from the spectrum. 

Diffusion editing is achieved with the use cf 
a pair of gradient pulses. If field homogeneity 
is ignored, then all spins experience an identi¬ 
cal magnetic field despite having different po¬ 
sitions throughout the sample. The applica¬ 
tion of a field gradient has the effect of making 
field strength dependent on position. Under 
the influence of the gradient pulse, the phase 
of individual spins become dependent on their 
position within the sample and hence the 
spins are spatially "encoded." If diffusion does 
not occur, this spatial encodingis fully revers¬ 
ible by a second gradient of inverse polarity 
and no loss of NMR signal will occur. How¬ 
ever, the second gradient pulse will be unable 
to "decode" the spins that have undergone dif¬ 
fusion and the resulting NMR signal will be 
reduced. Acquiring spectra of a sample with 
and without the diffusion filter and then sub¬ 
tracting them allows the ligands binding to 
the protein to be identified. This filtering 
method can be used for both ID and 2D exper¬ 
iments and can be "tuned" by altering the 
strength and duration of the gradients. 

Because the ligand signals are being ob¬ 
served in this screening method, no convolu¬ 
tion of the ligand mixture is required, given 
that any signals can be assigned directly to 
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individual compounds within the mixture. 
However, signals from the protein are always 
present, which can pose a problem in inter¬ 
preting spectra. An isotope-edited version of 
the diffusion experiment has been designed to 
avoid this problem, although labeled protein is 
required (168). Generally, there is no require¬ 
ment for labeling of the protein target or for 
the protein resonances to be assigned and 
thus, in theory, there is no size limit on the 
proteins that can be screened by use of this 
method, although no information is obtained 
on the location of ligand binding. However, if 
the protein is large, then the transverse relax¬ 
ation time may be too short to observe the 
bound ligands in the diffusion-edited spec¬ 
trum (169). Only one sample, containing pro¬ 
tein and ligands, is used to obtain both refer¬ 
ence and screening data and therefore 
differences between the sample and reference 
spectra caused by addition of the ligands (pH, 
salt concentration, etc.) are avoided. 

Diffusion-filtered NMR screening requires 
that there is a significant difference in ob¬ 
served translational diffusion between the 
free and bound states. The ligands are in fast 
exchange on the diffusion timescale and as a 
consequence the observed diffusion coefficient 
for binding ligands is an average between the 
free and bound diffusion values. Free ligands 
diffuse at a much faster rate than those in the 
bound state and thus only a small amount of 
free ligand has a considerable effect on the 
observed average diffusion coefficient. This ef¬ 
fect may be significant enough to reduce the 
difference between binding and nonbinding li¬ 
gands, making it more difficult to interpret 
results (169). It has also been demonstrated 
that chemical exchange and NOE can affect 
the interpretation of diffusion experiments 
and that these factors need to be taken into 
consideration (170,171). 

Shapiro and coworkers developed a meth¬ 
odology based on diffusion filtering, named 
"affinity NMR," that they have used to screen 
for binding (172-175). Diffusion-edited NMR 
experiments were able to identify two known 
binding tetrapeptide ligands of vancomycin 
from a mixture of 10 peptides (176). Hajduk et 
al. demonstrated the application of diffusion¬ 
editing experiments by differentiating ligands 


of stromelysin from a mixture containing non¬ 
binding compounds (177). 

4.1.4 Relaxation. Like diffusion, the trans¬ 
verse relaxation time (T 2 ) of molecules is also 
dependent on molecular size. Large molecules, 
such as proteins, have a short T 2 and hence 
exhibit broad NMR signals, whereas small 
molecules have a longer T 2 and hence nar¬ 
rower line widths. Therefore, if a small mole¬ 
cule ligand binds to a protein, its T 2 value will 
decrease and a line-broadening effect of bound 
ligand signals can be observed. Alternatively, 
a relaxation filter can be used to remove sig¬ 
nals from molecules with a short T 2 value. 
Subtraction from a reference spectrum will re¬ 
sult in a spectrum containing only those li¬ 
gands that bind to the protein. 

The ability to identify binding ligands us¬ 
ing relaxation filters has been demonstrated 
using FKBP. A mixture of nine compounds 
consisting of one known ligand of FKBP, 
2-phenylimidazole (34), and eight nonbinding 
compounds (e.g., 35-37 were screened and 
only signals from (34) were observed (177). 



H 

(34) 



(36) 


4.1.5 NOE . NOE experiments can also be 
used to identify ligands that bind to protein 
targets (178-180).Small molecules have a fast 
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tumbling rate and, as a consequence, gener¬ 
ally exhibit small positive NOEs. In contrast, 
large molecules such as proteins generate 
strong negative NOEs because of their slow 
tumbling time. When a small molecule binds 
to a protein, its tumbling rate is slowed to that 
of the protein and it exhibits strong negative 
NOEs. On dissociation, these are transiently 
retained and are known as transfer NOEs 
(TrNOEs) (Fig. 12.33). TrNOEs and those 
arising directly from the free ligand can be 
distinguished by the rate of signal build up. 
Transfer NOEs accumulate significantly 
faster and therefore can be selected for by use 
of shorter mixing times in the NOE experi¬ 
ment (179). 

In practice, a 2D NOESY spectrum of the 
mixture of potential ligands in the absence of 
protein is recorded and all molecules exhibit 

Fiee ligand Bound ligand 

small positive NOEs large negative NCEs 

O 

Relaxation 


O 

Detection of large negative trNOEs 

Figure 12.33. A schematic representation of the 
TrNOE experiment used to detect ligand binding 
(180). The free ligand (white ellipse) exhibits only 
small positive NOEs, although binding to the large 
protein target results in the generation of large neg¬ 
ative TrNOEs. The appearance of these large nega¬ 
tive TrNOE signals can be used to identify ligands 
within a mixture that are binding to the protein and 
also provide some information on the bound confor¬ 
mation of the ligand. 




small positive NOEs. The experiment is then 
repeated in the presence of protein and mole¬ 
cules that bind display negative TrNOEs. Sub¬ 
traction of the two spectra provides signals 
arising from only those compounds that bind. 
These TrNOEs can be interpreted to provide 
information regarding the bound conforma¬ 
tion of the active ligands. However, when an¬ 
alyzing the conformational data care must be 
taken to ensure that the ligands are in fast 
exchange and that the observed TrNOEs are 
not affected by contributions from spin-diffu¬ 
sion (179). Relative binding affinities between 
ligands can also be determined by comparison 
of TrNOE signal strength but, again, the fast- 
exchange regime and spin-diffusion effects 
need to be taken into account (178,179). If all 
ligands are in fast exchange, the stronger 
binding ligands occupy more binding sites and 
thus give larger TrNOE intensities. Because 
of the need for an averaging effect, brought 
about by fast chemical exchange, TrNOE ex¬ 
periments are limited to those ligands with a 
Kj) value from 10 to 10 M. The spectral 
properties of excess ligand in solution are 
evoked by small fractions of bound molecules, 
greatly enhancing sensitivity (178). 

Transfer NOE experiments have been used 
to identify a bioactive disaccharide from a li¬ 
brary of 15 mono- and disaccharides that 
bound to Aleuria aurantia agglutinin (179). 
Another study has described the identification 
of a silalyl Lewis mimetic (38) that binds to 



(38) 


E-selectin from a library of 10 compounds 
(178). As well as being used to detect binding, 
TrNOEs may also be used to determine bound 
ligand conformations, as described earlier in 
this chapter. 
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A second technique that uses NOEs to de¬ 
tect binding is NOE pumping. This method 
was designed to alleviate some of the problems 
associated with the diffusion-edited screening 
methods (169).Signals from ligand molecules 
are removed using a diffusion filter and then 
transfer of signal from the protein to bound 
ligands by NOE occurs. The inverse of this is 
possible (known as reverse NOE pumping), 
which uses a relaxation filter to attenuate the 
protein resonances, after which the signal is 
transferred to the protein by NOE. Ligands 
may lose signal either by relaxation (for a free 
ligand) or through relaxation and NOE trans¬ 
fer (for a bound ligand). Therefore by sub¬ 
tracting spectra (which is done internally to 
reduce subtraction artifacts) from experi¬ 
ments with and without NOE pumping to the 
protein, the binding ligands can be detected 
(181). 

The ability of NOE and reverse NOE 
pumping to identify ligands has been demon¬ 
strated through the use of human serum albu¬ 
min (HSA) and several known binding and 
nonbinding compounds (169,181). 

4.1.6 Spin Labels. Spin-spin relaxation rates 
are proportional to the product of the squares 
cf the gyromagnetic ratios of the involved 
spins. The gyromagnetic ratio of an unpaired 
electron is significantly larger than that of a 
proton and therefore any spins influenced by 
this electron will have substantially shortened 
relaxation times. The resonances of protons 
that are within 15-20 A from the unpaired 
electron will experience this effect and be sig¬ 
nificantly broadened. The introduction of a 
short spin-lock period will significantly reduce 
the intensity or quench these signals. 

The spin-label method can be used as either 
a primary screening method or to identify a 
second ligand-binding site. The primary 
screening method requires residues around 
the binding pocket of the target to be spin la¬ 
beled. Residues suitable for this labeling in¬ 
clude lysine, cysteine, histidine, glutamate, 
aspartate, tyrosine, and methionine. Any li¬ 
gands that bind to the protein in close proxim¬ 
ity to the spin-labeled residues will be able to 
be identified. To screen for second-site ligand 
binding, the known first-site binding ligand is 
spin labeled. A reduced signal will be observed 


for any ligands that bind simultaneously and 
in close proximity to the first ligand-binding 
site. In addition, the degree of reduction in 
signal intensity gives an indication of the ori¬ 
entation of the second ligand in relation to the 
first, given that the effect of the spin label is 
inversely proportional to the distance separat¬ 
ing the electron and proton. This information 
is valuable in the design of linkers to join the 
two ligands. 

There are several advantages to using the 
spin-label screening method. Currently, it is 
the only method that can detect ligands that 
bind to the protein simultaneously, unlike 
other methods that can produce false positives 
if the first ligand-binding site is not fully sat¬ 
urated. The concentration of protein required 
for screening is relatively small (~10 (jlM) be¬ 
cause of the substantial enhancement of the 
relaxation rate by the spin label. The protein 
can also be unlabeled and partially purified 
and there is no molecular weight limit. The 
spin labels also quench protein signals, mak¬ 
ing interpretation of spectra easier. The ex¬ 
periment is easy to set up and analyze, making 
it amenable to automation. It is also insensi¬ 
tive to small changes in solvent conditions 
that can generate false positives in other 
methods. The information obtained on the ori¬ 
entation of ligands is also valuable and makes . 
it an alternative to the chemical-shift pertur¬ 
bation methods when the proteins are large 
and NMR assignments have not been made. 

A disadvantage of the method is the re¬ 
quirement for spin-labeled proteins and li¬ 
gands. In addition, any ligands with slow dis¬ 
sociation rates will show no averaging of 
relaxation rates and therefore tightly binding 
compounds (K u < 10 ~ 6 M) will produce false 
negatives. Protein spin labeling must occur 
adjacent but not within the binding site to 
minimize alteration of its binding properties. 

The antiapoptopic protein Bcl-xLis respon¬ 
sible for the reduced susceptibility of cancer 
cells to undergo apoptosis and is therefore a 
target for the development of new anticancer 
agents. The structure of a previously identi¬ 
fied ligand for Bcl-xL (39) was modified to in¬ 
corporate a TEMPO spin label (40). By use of 
spin-labeled (40), an eight-compound library 
was screened for simultaneous binding to Bcl- 
xL. From this library an aromatic ketoxime 
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(41) was identified as binding simultaneously 
with and in the vicinity of (40). Analysis of 
relaxation enhancements revealed that the 
protons around the indole ring were closest to 
the spin label. 

4.2 Practical Considerations 

4.2.1 Screening Approach. The first step in 
using NMR screening is to select a suitable 
screening method for the target protein being 
used. Table 12.8 lists the characteristics of 
each NMR screening method. The choice of 
experiment will be determined by the charac¬ 
teristics of the protein target and the informa¬ 


tion that is desired from the screen. For exam¬ 
ple, the SAR-by-NMR method is suitable only 
for small, easily expressed proteins because 
the NMR assignments for the target need to be 
known so the location of binding can be deter¬ 
mined and a large amount of 15 N-enriched 
protein is required. If a simple “yes/no” an¬ 
swer on ligand binding is wanted, then the 
shorter, less resource intensive ligand-ob¬ 
served experiments (e.g., STD, diffusion-ed¬ 
ited, or TrNOE) may suffice. 

It is also important to determine the cor¬ 
rect NMR solvent conditions for the screening 
procedure. These should facilitate good solu¬ 
bility, with little precipitation or aggregation, 
and acquisition of good quality data; maintain 
protein structure and activity; and provide a 
sufficient buffering effect to allow for ligands 
to be added. Two methods that permit the 
screening of a range of solvent conditions to 
determine without the need for a large 
amount of protein are the microdialysis but¬ 
ton test (182) and the microdrop-screening 
method (183). A review on this subject has re¬ 
cently been published (184), to which the 
reader is referred for a more in-depth discus¬ 
sion on the subject of solvent conditions for 
NMR. 

4.2.2 Library Design. Effective design qnd 
management of the ligand library to be used 
for screening is essential if successful results 
are to be obtained. The major considerations 
in library design are only briefly described 
here. There are a number of reviews that pro¬ 
vide a more in-depth discussion of library de¬ 
sign (15,185). 

4.2.2.1 Ligand Properties. Diversity of li¬ 
gands is an important factor to consider in the 
design of a library for NMR screening and 
there are a number of factors to take into con¬ 
sideration. Although it would seem logical to 
maximize diversity, this may not always be the 
most efficient approach. If the system being 
studied exhibits neighborhood behavior, then 
maximizing diversity is a good option. Neigh¬ 
borhoods are regions of multidimensional mo¬ 
lecular space defined by a set of molecular de¬ 
scriptors. By the choice of a molecule that is in 
the center of a neighborhood, it is possible, in 
theory, to represent all molecules within that 
molecular space. By spreading out the mole- 
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Figure 12.34. Examples of 
molecular frameworks from the 
SHAPES library. 


cules that are selected for the library so that 
each neighborhood does not overlap, diversity 
is maximized. 

However, if neighborhoods are only small 
then compound libraries must be very large so 
that the neighborhoods overlap and hence all 
molecular space is covered. In addition, some 
systems do not exhibit neighborhood behavior 
and relatively small changes to the structure 
of a compound may lead to large changes in its 
binding affinity for the target. Maximizing di¬ 
versity may also be inefficient because many 
molecules do not possess physicochemical 
characteristics that are suitable as the basis 
for a drug. In practice, the more that is known 
about the drug target, the less diverse and 
more focused the library can be. However, if 
the library is too focused then some outlying 
“new" ligand type for the target being 
screened may be missed. 

One strategy for library design is to select 
compounds that have druglike characteristics. 
A simple set of rules, determined by Lipinski 
and coworkers, for determining whether a 
compound is druglike is known as "the rule of 
5." According to this set of criteria, the major¬ 
ity of orally available drugs have five or fewer 
hydrogen bond donors, 10 or fewer hydrogen 
bond acceptors, a log P of less than 5, and a 
molecular weight less than 500 (186). Addi¬ 
tional factors that can be taken into consider¬ 
ation include the number of heavy atoms, ro¬ 
tatable bonds, and ring systems (187-189). 
Another study has revealed that there are a 
number of frameworks and side-chains that 
commonly occur in many drugs. Drug mole¬ 
cules, from the comprehensive medicinal data¬ 
base, were broken down into systems consist¬ 
ing of frameworks (Fig. 12.34) and side- 
chains. Analysis of these two structural 
features revealed that approximately 50% of 


all known drugs could be represented by only 
32 different frameworks. When atom type and 
bond order were included in the analysis, 41 
frameworks were found to describe 24% of all 
drugs (190). A similar analysis of side-chain 
frequency indicated that approximately 70% 
of all side chains present in the compound da¬ 
tabase analyzed were from the top 20 occur¬ 
ring side chains (191). 

The presence of these common frameworks 
and side-chains has been exploited in the 
SHAPES methodology (7) for NMR screening. 
This strategy employs a small focused library 
based on these common frameworks and side 
chains to screen against protein targets 
through the use of relaxation and NOE exper¬ 
iments. The advantages of this approach are 
that the library is small and hence only rela¬ 
tively small amounts of protein are required 
and any hits from the library will possess 
druglike characteristics. However, a disadvan¬ 
tage of the method is that it is unlikely to yield 
new drug types, given that the library is based 
on known drug frameworks. 

Diversity of molecular type is not the only 
factor that must be taken into account when 
designing a library to be used in an NMR 
screening program. Because the screening oc¬ 
curs in an aqueous solution, the organic com¬ 
pounds chosen for the library must demon¬ 
strate reasonable solubility in the aqueous 
conditions used. In general, compounds are 
dissolved in DMSO and then added to the pro¬ 
tein solution at the appropriate concentration. 
Currently, there are no good methods for de¬ 
termining the solubility of a wide range of 
compounds before screening commences. A 
simple method is to dilute the DMSO solution 
in buffer and observe whether any precipita¬ 
tion or aggregation occurs. However, this 
method will not be suitable for compounds 
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that precipitate or aggregate over several 
hours, and solutions that appear clear may 
still contain high MW aggregates, which will 
cause false positives in experiments such as 
the diffusion, relaxation, and TrNOE methods 
(15,185). 

It is also preferable to choose ligands that 
are synthetically accessible and/or possess 
suitable moieties to build upon or link to other 
fragments. This is especially important in the 
SAR-by-NMR screening methodology because 
this relies on the ability to link individual frag¬ 
ments to form a more potent drug lead. If the 
ligands to be linked are not synthetically ac¬ 
cessible or do not possess suitable linking 
functional groups then this process is severely 
hindered. 

4.2.2.2 Mixture Design. The optimal num¬ 
ber of compounds per mixture is dependent on 
the screening method. For ligand-observed ex¬ 
periments the limiting factor for the number 
of compounds in a mixture is spectral overlap. 
Ligands need to be chosen so that spectral 
overlap is minimized, making interpretation 
of the resulting data far simpler. In theory, 
protein-observed experiments could have a 
large number of compounds per mixture that 
would both minimize screening time and the 
requirement for large amounts of protein. 
However, because the experiments are protein 
observed then deconvolution of the mixtures 
and rescreening of each individual compound 
are required to identify any hits. Therefore, 
the number of compounds per mixture is de¬ 
pendent on the hit rate in the screening pro¬ 
cedure, given that the greater the hit rate, the 
more deconvolution steps required and conse¬ 
quently more protein and spectrometer time 
are needed. The number of experiments re¬ 
quired is at a minimum when the number of 
compounds is equal to l/(hit rate) 1/2 ; thus, 
with a hit rate of 10% the optimal number of 
compounds per mixture is three (185). In ad¬ 
dition to these factors, if the hit rate is high 
then it is likely that several compounds within 
a mixture containing a large number of com¬ 
pounds may compete for the same binding 
pocket, which may lead to false negatives. 

In mixtures of organic compounds the pos¬ 
sibility of interactions between compounds, 
such as reactions or ion pairing, is also possi¬ 
ble and should be taken into consideration, 


especially when one uses large numbers of 
compounds per mixture. It has been demon¬ 
strated that in random mixtures of 10 com¬ 
pounds in DMSO, the probability of a reaction 
occurring between two of the mixture's com¬ 
ponents is 26%. This value can be reduced by 
careful selection of mixture components (e.g., 
separating acids from bases) to approximately 
9% (192). 

4.2.3 Hardware and Automation. Automa¬ 
tion is a requirement if libraries containing a 
large number of compounds are to be 
screened. Technology has been developed that 
allows the automation of almost all steps of 
the NMR screening process from sample prep¬ 
aration through to data analysis (193). 

The general setup for NMR screening con¬ 
sists of a robot for just-in-time preparation of 
each sample, which is then transferred to the 
magnet either through a flow system or as dis¬ 
crete samples on a rail system. There are sev¬ 
eral disadvantages in using a flow system, in¬ 
cluding the possibility of contamination of 
samples by previously screened compounds, 
the capillary line can be blocked if the protein 
or ligands precipitate or form aggregates, re¬ 
covery of the sample is more laborious because 
it has been diluted, and cryoprobe technology 
(discussed later in this section) is not yet avail¬ 
able in the flow system. Many of these prob¬ 
lems can be overcome by using the discrete 
samples with the rail system. 

Data acquisition is easily automated and 
there are several software packages that will 
automate data processing for 2D spectra. The 
processing of ID spectra automatically is re¬ 
ported to be less reliable because of the large 
solvent signal and usually require manual ad¬ 
justment of phasing (193). One of the most 
laborious tasks in NMR screening is the anal¬ 
ysis and comparison of the resulting spectra. 
For ID ligand-observed experiments differ¬ 
ence methods (e.g., STD) provide the most re¬ 
liable method for interpretation of results, in 
that the presence of signals in the spectra will 
correspond to the ligands that are binding. 
In 2D protein-observed experiments (e.g., 1 H/ 
15 N HSQC) a more statistically rigorous anal¬ 
ysis of changes in chemical shift is required 
and a discussion of this is beyond the scope of 
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this chapter. A more in-depth account of data 
analysis is provided by Ross and Senn (193). 

Currently, approximately 50-100 samples 
can be screened per day and if mixtures con¬ 
tain 10 compounds each this provides a sub¬ 
stantial throughput. This throughput rate will 
increase as technology improves, as has been 
demonstrated by the use of cryoprobes. Cryo¬ 
genic NMR probes, where the preamplifier 
and radio frequency coils are cooled to low 
temperatures, can significantly increase the 
signal-to-noise ratio of an NMR spectrum. By 
use of these probes NMR data can be obtained 
in much faster times and by use of lower pro¬ 
tein concentrations, which subsequently in¬ 
creases throughput, the total amount of pro¬ 
tein needed to screen a library is reduced. 
Hajduk and coworkers (21) demonstrated the 
substantial improvements made through the 
use of a CryoProbe instead of a conventional 
probe in 1 H/ 15 N chemical-shift perturbation 
screening. Stromelysin (50 pM) was screened 
against mixtures of 100 compounds (50 pM 
each), facilitating the screening of more than 
10,000 compounds in one day. The use of lower 
concentrations of both protein and ligands in¬ 
creases the stringency levels for the binding 
strength of ligands. At a protein/ligand con¬ 
centration of 0.5 mM, ligands with dissocia¬ 
tion constants in the millimolar range can be 
detected, although at a protein/ligand concen¬ 
tration of 50 fjM this dissociation constant 
limit is reduced to approximately 0.15 m M. 
Although using higher protein/ligand concen¬ 
trations can be advantageous when screening 
libraries containing small low affinity ligands, 
a higher stringency is required when screen¬ 
ing large libraries, to reduce the number of 
hits obtained to a manageable number (21). 

5 CONCLUSIONS 

In this chapter we have given an overview of 
the two major approaches used in NMR and 
drug discovery, structure-based design and 
NMR-based screening. Both areas are flour¬ 
ishing and, together with more traditional 
uses of NMR, they demonstrate the versatility 
of NMR as a tool in medicinal chemistry. The 
power of NMR has been dramatically en¬ 
hanced over the last decade by developments 


in both instruments and methodology. On the 
instrumental side, increases in magnetic field 
strengths and the development of cryoprobes 
have greatly increased sensitivity. Linkages of 
NMR to LC and MS have increased versatility. 
On the methods front there have been a range 
of new approaches discovered that will en¬ 
hance the study of larger molecular com¬ 
plexes. Advances in protein expression and la¬ 
beling have played a major role in stimulating 
the development of new NMR pulse sequences 
to extract information from such complexes. 
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1 INTRODUCTION 

At the beginning of the 20th century, mass 
spectrometers were invented to help physi¬ 
cists and physical chemists prove the existence 
of isotopes of the elements. As radioactivity 
and nuclear physics was explored, specialized 
mass spectrometers were used to characterize 
the fission products of radioactive elements as 
they were created or discovered. In addition, 
mass spectrometers were used for the mea¬ 
surement of isotopic enrichment of radioac¬ 
tive elements, their inorganic derivatives, and 
even the isotopic purification of radioactive el¬ 
ements as inorganic compounds. As this era of 
mass spectrometry reached maturity in the 
1940s, some physicists announced that there 
would no longer be any need for mass spec¬ 
trometry because virtually all of the elements 
had been discovered and characterized. Of 
course, these prognosticators were wrong be¬ 
cause the entire field of organic mass spec¬ 
trometry was about to begin. 

While mass spectrometers were being used 
for the purification of fissionable material for 
atomic weapons as part of the Manhattan 
Project of World War II, organic mass spec¬ 
trometry was being invented for the analysis 
and quality control of aviation fuel. In 1945, 
the application of mass spectrometry to or¬ 
ganic chemistry emerged as a productive new 
area of research and discovery. Commercial 
production of organic mass spectrometers be¬ 
gan immediately, and petroleum companies 
became the first customers for these new ana¬ 
lytical instruments. Early commercial mass 
spectrometers used electron impact (El) ion¬ 
ization (seeEquations 13.1 and 13.2) to gener¬ 
ate ions from gas-phase molecules that were 
separated by acceleration through an electro¬ 
magnetic field provided by either a fixed mag¬ 
net or an electromagnet. After separation, the 
ions were detected using a simple impact de¬ 
tector such as a Faraday cup. This basic design 
is still in use today for the identification and 
quantitative analysis of volatile organic com¬ 
pounds. 


M + e~ (70 eV) —» M + ' + 2e~ formation 

of positive molecular ions using (13.1) 

El ionization 

M + e~ (2-10 eV) -» M - ' ^ ^ 

electron capture El ionization 

Toward the late 1950s, organic mass spec¬ 
trometers began to be used for the analysis of 
a wider variety of organic molecules and even¬ 
tually became a fundamental analytical tool 
for the characterization of synthetic organic 
compounds. Today, mass spectrometers are 
used routinely to confirm the molecular 
weights of organic compounds and to verify 
their structures based on fragmentation pat¬ 
terns. Fragmentation results from the cleav¬ 
age of chemical bonds within an ion, resulting 
in the formation of a product ion of lower mass 
and one or more neutral products. Qf course, 
only the fragment ions and not the neutral 
species are detected in a mass spectrometer 
because this instrument measures the mass- 
to-charge ratio ( mjz ) of ions in the gas phase. 
The energy for fragmentation is the result of 
excess energy imparted to the molecular ion or 
during a process known as collision-induced 
dissociation (CID), which will be discussed 
along with tandem mass spectrometry (MS- 
MS) below. Because the fragmentation pat¬ 
tern reflects the relative strengths of chemical 
bonds in a compound, mass spectra (a plot of 
ion relative abundance versus mjz) provide 
structurally significant fragment ions for com¬ 
pound identification. Rules for structure elu¬ 
cidation of chemical structures through the in¬ 
terpretation of mass spectra have been 
developed. (For a review of El and ion frag¬ 
mentation pathways, see McClafferty et al. 
1997, Section 4). 

In many cases, El imparts so much excess 
energy into a molecule that only fragment ions 
and no molecular ions are produced. There¬ 
fore, "softer" ionization techniques were de¬ 
veloped to enhance molecular weight informa¬ 
tion. The first of these ionization methods was 
chemical ionization (Cl). Developed by re- 
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Table 13.1 Types of Mass Spectrometers and Tandem Mass Spectrometers 


Instrument 

Resolving Power 

miz Range 

Tandem MS 

Magnetic sector 

100,000 

12,000 

Low resolution 

Quadmpole 

< 4,000 

4,000 

none 

Triple quadrupole 

< 4,000 

4,000 

Low resolution 

Time-of-flight (TOF) 

15,000 

> 200,000 

none 

FTICR 

> 200,000 

< 10,000 

MS n , high resolution 

Ion trap 

< 4,000 

< 10,000 

MS, low resolution 

QTOF 

12,000 

4,000 

High resolution 

TOF-TOF 

15,000 

> 10,000 

High resolution 


searchersin the petroleum industry (1), Cl be¬ 
came another standard ionization technique 
for organic mass spectrometry. During Cl, 
high energy electrons (as in El) are used to 
ionize a gas called a reagent gas at a constant 
pressure (usually —1 Torr) in the mass spec¬ 
trometer ionization source. The reagent gas in 
turn ionizes the sample molecules through 
ion-molecule reactions that usually involve 
the exchange of protons. Less frequently, sam¬ 
ple molecule ionization might involve a charge 
exchange. Two of the most common ionization 
mechanisms in Cl are summarized in Equa¬ 
tions 13.3 and 13.4. 


M + RH + -» MH + + R Cl through proton 
transfer, R = reagent gas (13.3) 


M + R + ' M + - + R 

Cl through charge exchange 


(13.4) 


During the 1960s, high resolution double-fo¬ 
cusing magnetic sector instruments became 
available and are now standard tools for the 
determination of elemental compositions us¬ 
ing a type of analysis called exact mass mea¬ 
surement. In mass spectrometry, resolution is 
defined as M/AM, where M is the mjz value of a 
singly charged ion, and AM is the difference 
(measured in miz ) between M and the next 
highest ion. Alternatively, AM may be defined 
in terms of the width of the peak. High resolu¬ 
tion is typically regarded as a value of at least 
10,000. At this resolution, the molecular ions 
of most drug-like molecules (that is com¬ 
pounds with molecular weights less than 
—500) can be resolved from each other. After 
resolving a sample ion from others in a mass 
spectrum, an exact mass measurement may be 


carried out by accurately weighing the un¬ 
known ion and comparing its m 1 .value to that 
of a calibration standard. Since the 1960s, 
other types of mass spectrometers capable of 
high resolution exact mass measurements 
have become available as commercial prod¬ 
ucts, including Fourier transform ion cyclo¬ 
tron resonance (FTICR) mass spectrometers, 
reflectron TOF instruments, and recently, 
quadrupole time-of-flight hybrid (QqTOF) 
mass spectrometers (see Table 13.1 for a list¬ 
ing of types of organic mass spectrometers and 
a comparison of their performance character¬ 
istics). By the early 2000s, FTICR and QqTOF 
instruments became more popular than mag¬ 
netic sector mass spectrometers for exact 
mass measurements, high resolution mea¬ 
surements, and drug discovery applications. 
As will be discussed below, exact mass mea¬ 
surements are essential to many types of mass 
spectrometry-based screening and drug dis¬ 
covery today. 

Biomedical applications of mass spectrom¬ 
etry began during the 1960s both at academic 
institutions and pharmaceutical companies. 
These applications depended on the volatiliza¬ 
tion (usually by heating) of pharmaceutical 
compounds and biochemicals before their gas- 
phase ionization using El or CL To increase 
the thermal stability and volatility of these 
compounds, a variety of derivatization meth¬ 
ods were developed to mask polar functional 
groups and reduce hydrogen bonding between 
molecules. These methods were particularly 
effective for use with gas chromatography- 
mass spectrometry (GC-MS), which was intro¬ 
duced during the 1960s as a practical and pow¬ 
erful tool for qualitative and quantitative 
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analysis of compounds in mixtures. Both El 
and Cl were immediately useful for GC-MS, 
because both of these ionization methods re¬ 
quire that the analytes be in the gas phase. 
When capillary GC was incorporated into GC- 
MS, this technique reached maturity. GC-MS 
may be used to select, identify, and quantify 
organic compounds in complex mixtures at 
the femtomole level. The speed of GC-MS is 
determined by the chromatography step, 
which typically requires several minutes to 1 h 
per analysis. By the 1970s, some organic 
chemists were announcing that organic mass 
spectrometry had reached maturity and that 
no new applications were possible. Like the 
physicists and physical chemists who had pro¬ 
nounced the end of mass spectrometry a gen¬ 
eration earlier, this group would soon be 
proved wrong. 

Although GC-MS remains important for 
the analysis of many organic compounds, this 
technique is limited to volatile and thermally 
stable compounds that comprise only a small 
fraction of all organic compounds and even 
fewer biomedically important molecules. 
Therefore, thermally unstable compounds, in¬ 
cluding many pharmaceutical compounds 
such as nucleic acid analogs and biomolecules 
such as proteins, carbohydrates, and nucleic 
acids, cannot be analyzed in their native forms 
using GC-MS. (For more details regarding 
GC-MS and its applications, see Watson 1997, 
Section 4.) Although derivatization facilitates 
the GC-MS analysis of many of these com¬ 
pounds, alternative ionization techniques 
were needed for the analysis of the vast major¬ 
ity of polar and non-volatile compounds of in¬ 
terest to drug discovery. 

During the 1970s and early 1980s, desorp¬ 
tion ionization techniques such as field de¬ 
sorption (FD), desorption El, desorption Cl 
(DCI), and laser desorption were developed to 
extend the use of mass spectrometry toward 
the analysis of more polar and less volatile 
compounds (see Watson 1997, Section 4, for 
more information regarding desorption ion¬ 
ization techniques including DCI and FD). Al¬ 
though these techniques helped extend the 
mass range of mass spectrometry beyond a 
traditional limit of m/z 1000 and toward ions 
of mlz 5000, the first breakthrough in the anal¬ 
ysis of polar, non-volatile compounds occurred 


in 1982 with the invention of fast atom bom¬ 
bardment (FAB) (2). FAB and its counterpart, 
liquid secondary ion mass spectrometry 
(LSIMS), facilitated the formation of abun¬ 
dant molecular ions, protonated molecules, 
and deprotonated molecules of non-volatile 
and thermally labile compounds such as pep¬ 
tides, chlorophylls, and complex lipids up to 
approximately mlz 12,000. FAB and LSIMS 
use energetic particle bombardment (fast at¬ 
oms or ions from 3 to 30,000 V of energy) to 
ionize compounds dissolved in non-volatile 
matrices such as glycerol or 3-nitrobenzyl al¬ 
cohol and desorb them from this condensed 
phase into the gas phase for mass spectromet- 
ric analysis (see Fig. 13.1). Protonated or de¬ 
protonated molecules are usually abundant 
and fragmentation is minimal. 

Introduced in the late 1980s, matrix-as¬ 
sisted laser desorption ionization (MALDI) 
has helped solve the mass limit barriers of la¬ 
ser desorption mass spectrometry so that sin¬ 
gly charged ions may be obtained up to mlz 
500,000 and sometimes higher (3). For most 
commercially available MALDI mass spec¬ 
trometers, ions up to mlz 200,000 are readily 
obtained. Like FAB and LSIMS, MALDI sam¬ 
ples are mixed with a matrix to form a solution 
that is loaded onto the sample stage for anal¬ 
ysis. Unlike the other matrix-mediated tech¬ 
niques, the solvent is evaporated before 
MALDI analysis, leaving sample molecules 
trapped in crystals of solid phase matrix. The 
MALDI matrix is selected to absorb the pulse 
of laser light directed at the sample. Most 
MALDI mass spectrometers are equipped 
with a pulsed UV laser, although IR lasers are 
available as an option on some commercial in¬ 
struments. Therefore, matrices are often sub¬ 
stituted benzenes or benzoic acids with strong 
UV absorption properties. During MALDI, the 
energy of the short but intense UV laser pulse 
obliterates the matrix and in the process de¬ 
sorbs and ionizes the sample. Like FAB and 
LSIMS, MALDI typically produces abundant 
protonated or deprotonated molecules with 
little fragmentation. 

By the time that GC-MS had become a stan¬ 
dard technique in the late 1960s, LC-MS was 
still in the developmental stages. Producing 
gas-phase sample ions for analysis in a vac¬ 
uum system while removing the high perfor- 
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Figure 13.1. Scheme for desorption ionization using FAB or LSIMS from a liquid matrix (O). 


mance liquid chromatography (HPLC) mobile 
phase proved to be a challenging task. Early 
LC-MS techniques included a moving belt in¬ 
terface to desolvate and transport the HPLC 
eluate into an Cl or El ion source or a direct 
inlet system in which the eluate was pumped 
at a low flow rate (1-3 juL/min) into a Cl 
source. However, neither of these systems was 
robust enough or suitable for a broad enough 
range of samples to gain widespread accep¬ 
tance. 

Because FAB (or LSIMS) requires that the 
analyte be dissolved in a liquid matrix, this 
ionization technique was easily adapted for in¬ 
fusion of solution-phase samples into the FAB 
ionization source in an approach known as 
continuous-flow FAB. Then, continuous-flow 
FAB was connected to microbore HPLC col¬ 
umns for LC-MS applications (4). Because this 
method is limited to microbore HPLC applica¬ 
tions at flow rates of <10 /u.L/min and requires 
considerable operator intervention, it is not 
ideal for the analysis of large sample sets. In¬ 
stead, more robust techniques have been de¬ 
veloped to fulfill this requirement. However, 
continuous-flow FAB is still in use in some 
laboratories. 

Like continuous-flow FAB, the popularity 
of particle beam interfaces is diminishing, but 
systems are still available from commercial 
sources. During particle beam LC-MS, the 
HPLC eluate is sprayed into a heated chamber 


connected to a vacuum pump. As the droplets 
evaporate, aggregates of analyte (particles) 
form and pass through a momentum separa¬ 
tor that removes the lower molecular weight 
solvent molecules. Finally, the particle beam 
enters the mass spectrometer ion source 
where the aggregates strike a heated plate 
from which the analyte molecules evaporate 
and are ionized using conventional El or Cl. 
Particle beam LC-MS is limited to the analysis, 
of volatile and thermally stable compounds 
that are amenable to flash evaporation and El 
or Cl mass spectrometry. Therefore, this ap¬ 
proach is not used for polar biochemicals such 
as carbohydrates, sugars, peptides, proteins, 
or nucleic acids. 

Because thermospray became the first 
widely used LC-MS technique (during the late 
1970s and early 1980s), this technique should 
be mentioned here. Thermo spray facilitates 
the interfacing of standard analytical HPLC 
systems at flow rates up to 1 mL/min with 
mass spectrometers. Although the interface 
between the HPLC and mass spectrometer is 
inefficient and exhibits low sensitivity for 
most analytes, thermospray has been useful 
for the LC-MS analysis of many types of small 
molecules. During thermospray, the HPLC el¬ 
uate is sprayed through a heated capillary into 
a heated desolvation chamber at reduced pres¬ 
sure. Gas phase ions remaining after desolva¬ 
tion of the droplets are extracted through a 
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Figure 13.2. Positive ion APCI 
mass spectrum of the red carot¬ 
enoid lycopene in a solution of 
methanol and tert-butyl methyl 
ether (1:1; v/v). In this analysis, ly¬ 
copene formed a protonated mole¬ 
cule instead of a molecular ion, 
M + \ 



skimmer into the mass spectrometer for anal¬ 
ysis. The sensitivity of thermospray is poor 
because there is no mechanism or driving 
force to enhance the number of sample ions 
entering the gas phase from the spray during 
desolvation. Also, thermally labile compounds 
tend to decompose in the heated source. These 
problems were solved when thermospray was 
replaced by electrospray during the late 1980s. 

During the 1990s, electrospray and atmo¬ 
spheric pressure chemical ionization (APCI) 
became the standard interfaces for LC-MS. 
Today, APCI and electrospray ionization are 
the most widely used ionization sources and 
HPLC interfaces for drug discovery using 
mass spectrometry. Unlike thermospray, par¬ 
ticle beam or continuous-flow FAB, electro¬ 
spray and APCI interfaces operate at atmo¬ 
spheric pressure and do not depend on 
vacuum pumps to remove solvent vapor. As a 
result, they are compatible with a wide range 
of HPLC flow rates. Also, no matrix is re¬ 
quired. Both APCI and electrospray are com¬ 
patible with a wide range of HPLC columns 
and solvent systems. Like all LC-MS systems, 
the solvent system should contain only vola¬ 
tile solvents, buffers or ion pair agents to re¬ 
duce fouling of the mass spectrometer ion 
source. In general, APCI and electrospray 
form abundant molecular ion species. When 
fragment ions are formed, they are usually 
more abundant in APCI than electrospray 
mass spectra. 

The APCI interface uses a heated nebulizer 
to form a fine spray of the HPLC eluate, which 
is much finer than the particle beam system 


but similar to that formed during thermo¬ 
spray. A cross-flow of heated nitrogen gas is 
used to facilitate the evaporation of solvent 
from the droplets. The resulting gas-phase 
sample molecules are ionized by collisions 
with solvents ions, which are formed by a co¬ 
rona discharge in the atmospheric pressure 
chamber. Molecular ions, M + ' or M - ', and/or 
protonated or deprotonated molecule; can be 
formed. The relative abundance of each type 
of ion depends on the sample itself, the HPLC 
solvent, and the ion source parameters. Next, 
ions are drawn into the mass spectrometer an¬ 
alyzer for measurement through a narrow 
opening or skimmer that helps the vacuum 
pumps to maintain very low pressure inside 
the analyzer, while the APCI source remains 
at atmospheric pressure. For example, the 
positive ion APCI mass spectrum of lycopene 
is shown in Fig. 13.2. The carotenoid lycopene 
is the red pigment of ripe tomatoes and is un¬ 
der clinical investigation for the prevention of 
prostate cancer (5). 

During electrospray, the HPLC eluate is 
sprayed through a capillary electrode at high 
potential (usually 2000-7000 V) to form a fine 
mist of charged droplets at atmospheric pres¬ 
sure. As the charged droplets migrate towards 
the opening of the mass spectrometer because 
of electrostatic attraction, they encounter a 
cross-flow of heated nitrogen that increases 
solvent evaporation and prevents most of the 
solvent molecules from entering the mass 
spectrometer. Molecular ions, protonated or 
deprotonated molecules, and cationized spe¬ 
cies such as [M + Na] + and [M + K] + can be 
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formed. (For additional information on elec¬ 
trospray ionization, see Cole 1997, Section 4). 
In addition to singly charged ions, electro¬ 
spray is unique as an ionization technique in 
that multiply charged species are common and 
often constitute the majority of the sample ion 
abundance. The relative abundance of each of 
these species depends on the chemistry of the 
analyte, the pH, the presence of proton donat¬ 
ing or accepting species, and the levels of trace 
amounts of sodium or potassium salts in the 
mobile phase. In contrast, APCI, MALDI, El, 
Cl, and FAB/LSIMS usually produce singly 
charged species. A consequence of forming 
multiply charged ions is that they are detected 
at lower mjz values (i.e., z > l)than the corre¬ 
sponding singly charged species. This has the 
benefit of allowing mass spectrometers with 
modest mjz ranges to detect and measure ions 
cf molecules with very high masses. For exam¬ 
ple, electrospray has been used to measure 
ions with molecular weights of hundreds of 
thousands or even millions of Daltons on mass 
spectrometers with miz ranges of only a few 
thousand. (For a review of LC-MS techniques, 
see Niessen 1999, Section 4.) 

An example of the C 18 reversed phase 
HPLC-negativeion electrospray mass spectro- 
metric (LC-MS) analysis of an extract of the 
botanical ,Trifoliumpratense L. (redclover),is 
shown in Fig. 13.3. Extracts of red clover are 
used as dietary supplements by menopausal 
and postmenopausal women and are under in¬ 
vestigation as alternatives to estrogen replace¬ 
ment therapy (6). The two-dimensional map 
illustrates the amount of information that 
may be acquired using hyphenated techniques 
such as LC-MS. In the time dimension, chro¬ 
matograms are obtained, and a sample com¬ 
puter-reconstructed mass chromatogram is 
shown for the signal at miz 269. An intense 
chromatographic peak was detected eluting at 
12.4 min. In the miz dimension, the negative 
ion electrospray mass spectrum recorded at 
12.4 min shows a base peak at mjz 269. Based 
on comparison with authentic standards (data 
not shown), the ion of mjz 269 was found to 
correspond to the deprotonated molecule of 
genistein, which is an estrogenic isoflavone 
(6). Because almost no fragmentation of the 
genistein ion was observed, additional charac¬ 


terization would require CID and MS-MS as 
discussed in the next section. 

When analyzing complex mixtures such as 
the botanical extract shown in Fig. 13.3, the 
use of chromatographic separation before 
mass spectrometric ionization and analysis is 
essential to distinguish between isomeric com¬ 
pounds. Even simple mixtures of synthetic 
compounds might contain isomers that would 
require LC-MS for adequate characterization. 
Another problem overcome by using a chro¬ 
matography step before mass spectrometric 
analysis is ion suppression. No matter what 
ionization technique is used, the presence of 
multiple compounds in the ion source might 
enhance the ionization of one compound while 
suppressing the ionization of another. Usu¬ 
ally, only some of the compounds in a complex 
mixture can be detected by mass spectrometry 
without chromatographic separation. The 
presence of salts and buffers in a sample can 
also suppress sample ionization. Therefore, 
LC-MS has become a powerful tool for analyz¬ 
ing natural products, synthetic organic com¬ 
pounds, and pharmaceutical agents and their 
metabolites. 

In general, APCI facilitates the ionization 
of non-polar and low molecular weight species, 
and electrospray is more useful for the ioniza¬ 
tion of polar and high molecular weight com¬ 
pounds. In this sense, APCI and electrospray 
are often complementary ionization tech¬ 
niques. However, during the analysis of large 
or diverse combinatorial libraries, both polar 
and non-polar compounds are usually present. 
As a result, no one set of ionization conditions 
using APCI or electrospray is adequate to de¬ 
tect all the compounds contained in the library 
of compounds. Therefore, a UV ionization 
technique called atmospheric pressure photo¬ 
ionization (APPI) has been developed for use 
with combinatorial libraries and LC-MS (7). 
Recently, APPI became a commercially avail¬ 
able ionization alternative to APCI and elec¬ 
trospray. During APPI, a liquid solution or 
HPLC eluate is sprayed at atmospheric pres¬ 
sure, as in APCI. Instead of using a corona 
discharge as in APCI, ionization occurs during 
APPI because of irradiation of the analyte 
molecules by an intense UV light source. Ob¬ 
viously, the carrier solvent must not absorb 
UV light at the same wavelengths, or interfer- 
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m/z 

Figure 133. Two-dimensional map showing the LC-MS analysis of an extract of red clover under 
investigation for the management of menopause. Reversed phase separation was carried out using a 
C 18 HPLC column in the time dimension and negative ion electrospray mass spectrometry was used 
for compound detection and molecular weight determination in the second dimension. 

ence would prevent sample ionization and de- hance the amount of structural information in 

tection. The use of APPI as an alternative to these mass spectra, CID may be used to pro- 

APCI and electrospray for drug discovery ap- duce more abundant fragment ions from mo- 

plications is under investigation. lecular ion precursors formed and isolated 

Desorption ionization techniques like FAB, during the first stage of mass spectrometry. 

MALDI, and electrospray facilitate the molec- Then, a second mass spectrometry analysis 

ular weight determination of a wide range of may be used to characterize the resulting 

polar, non-polar, and low, and high molecular product ions. This process is called tandem 

weight compounds including drugs and drug mass spectrometry or MS-MS and is illus- 

targets such as proteins. However, the "soft" trated in Fig. 13.4. 

ionization character of these techniques Another advantage of the use of tandem 

means that most of the ion current is concen- mass spectrometry is the ability to isolate a 

trated in molecular ions, and few structurally particular ion such as the molecular ion of the 

significant fragment ions are formed. To en- analyte of interest during the first mass spec- 
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Figure 13.4. Scheme illustrat¬ 
ing the selectivity of MS-MS and 
the process by which CID facili¬ 
tates fragmentation of prese¬ 
lected ions. Negative ion electro¬ 
spray tandem mass spectrum cf 
lycopene. CID was used to induce 
fragmentation of the molecular 
ion of m/z 536. As a result, the 
fragment ion cf m/z 467 was 
formed by the loss of a terminal 
isoprene unit. This fragment ion 
may be used to distinguish lyco¬ 
pene from isomeric a-carotene 
and (3-carotene, which lack termi¬ 
nal isoprene groups. 


trometry stage. This precursor ion is essen¬ 
tially purified in the gas-phase and free of im¬ 
purities such as solvent ions, matrix ions, or 
other analytes. Finally, the selected ion is frag¬ 
mented using CID and analyzed using a sec¬ 
ond mass spectrometry stage. In this manner, 
the resulting tandem spectrum contains ex¬ 
clusively analyte ions without impurities that 
might interfere with the interpretation of the 
fragmentation patterns. In summary, CID 
may be used with LC-MS-MS or desorption 
ionization and MS-MS to obtain structural in¬ 
formation such as amino acid sequences of 
peptides and sites of alkylation of nucleic ac¬ 
ids, or to distinguish structural isomers such 
as j3-carotene and lycopene. Beginning in 
2001, TOF-TOF tandem mass spectrometers 
became available from instrument manufac¬ 
turers. These instruments have the potential 
to deliver high resolution tandem mass spec¬ 
tra with high speed that should be compatible 
with the chip-based chromatography systems 
now under development. 

Over the course of the last century, mass 
spectrometry has become an essential ana¬ 
lytical tool for a wide variety of biomedical 
applications including drug discovery and 
development. By combining mass spectrom¬ 
etry with chromatography as in LC-MS or by 
adding another stage of mass spectrometry 
as in MS-MS, the selectivity of the technique 
increases considerably. As a result, mass 
spectrometry offers all of the analytical ele¬ 


ments that are essential to modern drug dis¬ 
covery namely speed, sensitivity, and selec¬ 
tivity. 

2 CURRENT TRENDS AND RECENT 
DEVELOPMENTS 

Since the early 1990s, pharmaceutical re¬ 
search has focused on combinatorial chemis¬ 
try (8, 9) and high-throughput screening (10) 
in an effort to accelerate the pace of drug dis¬ 
covery. The goal has been to produce, in a 
short time, large numbers of synthetic organic 
compounds representing a great diversity of 
chemical structures through a process called 
combinatorial chemistry and then quickly 
screen them in vitro against pharmacologi¬ 
cally significant targets such as enzymes or 
receptors. The "hits" identified through these 
high-throughput screens may then be opti¬ 
mized by quickly and efficiently synthesizing 
and then screening large numbers of analogs 
called targeted or directed libraries. As a re¬ 
sult, lead compounds might emerge from such 
combinatorial chemistry drug discovery pro¬ 
grams in a few weeks instead of several years. 
Furthermore, a single organic chemist using 
combinatorial synthetic methods might syn¬ 
thesize thousands of compounds or more in a 
single week instead of less than five in the 
same time using conventional techniques, and 
a single medicinal chemist might identify hun- 
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dreds of lead compounds per month instead of 
just one or two in the same period of time. 

Accompanying this new drug discovery 
paradigm, new scientific journals have been 
established such as Combinatorial Chemistry 
& High Throughput Screening, Journal of 
Combinatorial Chemistry, Journal of Biomo- 
lecular Screening, and Molecular Diversity 
(see list of journal websites in Section 4). The 
variety of topics published in these journals 
reflects the multidisciplinary nature of the 
current drug discovery process and ranges 
from organic chemistry, medicinal chemistry, 
molecular modeling, molecular biology, and 
pharmacology, to analytical chemistry. As de¬ 
scribed below, the most significant analytical 
component of drug discovery has become mass 
spectrometry. Only mass spectrometry has be¬ 
come an essential element at all stages of the 
drug discovery and development process. 

Although a variety of spectroscopic and 
chromatographic techniques, including infra¬ 
red spectroscopy, nuclear magnetic resonance 
spectroscopy, fluorescence spectroscopy, gas 
chromatography, HPLC, and mass spectrom¬ 
etry, are being used to support drug discovery 
in various capacities, some of them, such as 
gas chromatography and fluorescence spec¬ 
troscopy, are not applicable to most new chem¬ 
ical entities, some are not specific enough for 
chemical identification (e.g., infrared spec¬ 
troscopy), and other techniques suffer from 
low throughput (e.g., nuclear magnetic reso¬ 
nance spectroscopy). Unlike gas chromatogra¬ 
phy, HPLC is compatible with virtually all 
drug-like molecules without the need for 
chemical derivatization to increase thermal 
stability or volatility. In addition, mass spec¬ 
trometry provides a universal means to char¬ 
acterize and distinguish drugs based on both 
molecular weight and structural features 
while at the same time providing high 
throughput. With the development of routine 
LC-MS interfaces and ionization techniques 
such as electrospray and APCI, mass spec¬ 
trometry has also become an ideal HPLC de¬ 
tector for the analysis of combinatorial librar¬ 
ies (11), and LC-MS, MS-MS, and LC-MS-MS 
have become fundamental tools in the analysis 
of combinatorial libraries and subsequent 
drug development studies (12-14). 


The application of combinatorial chemistry 
and high-throughput screening to drug dis¬ 
covery has altered the traditional serial pro¬ 
cess of lead identification and optimization 
that previously required years of human ef¬ 
fort. Consequently, neither the synthesis of 
new chemical entities nor their screening is 
limiting the pace of drug discovery. Instead, a 
new bottleneck is the verification of the struc¬ 
ture and purity of each compound in a combi¬ 
natorial library or of each lead compound ob¬ 
tained from an uncharacterized library using 
high-throughput screening. Because the num¬ 
ber of lead compounds entering the drug de¬ 
velopment process has increased, in part be¬ 
cause compounds are entering development at 
earlier stages than in the past, the traditional 
drug development investigations concerning 
absorption, distribution, metabolism, and ex¬ 
cretion (ADME) and even toxicology evalua¬ 
tions of new drug entities have become addi¬ 
tional bottlenecks. As a solution to the drug 
development bottlenecks, high-throughput 
assays to assess the metabolism, bioavailabil¬ 
ity, and toxicity of lead compounds are being 
developed and applied earlier than ever during 
the drug discovery process, so that only those 
compounds most likely to become successful 
drugs enter the more expensive and slower 
preclinical pharmacology and toxicology stud¬ 
ies. In support of these new combinatorial 
chemistry synthetic programs and new high- 
throughput assays, mass spectrometry has 
emerged as the only analytical technique with 
sufficient throughput, sensitivity, selectivity, 
and robustness to address all of these bottle¬ 
necks. 

2.1 LC-MS Purification of Combinatorial 
Libraries 

Although combinatorial libraries were origi¬ 
nally synthesized as mixtures, today most li¬ 
braries are prepared in parallel as discrete 
compounds and then screened individually in 
microtiter plates of 96-well, 384-well, or 1536- 

well formats. To facilitate subseauent struc- 

*■ 

ture-activity analyses and to assure the valid¬ 
ity of the screening results, many laboratories 
verify the structure and purity of each com¬ 
pound before high-throughput screening. 
Semi-preparative HPLC has become the most 
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Figure 13A. Mass-directed purification of a combinatorial library. Chromatographic separation 
was carried out using gradient elution of 10-90%acetonitrile in water for 7 min after an initial hold 
at 1 0%acetonitrilefor 1 min. (a)Total ion chromatogram showing desired product and impurities, (b) 
Computer-reconstructed ion chromatogram (RIC) corresponding to the expected product, (c) Post¬ 
purification analysis cf the isolated component with a purity >90%. (Reproduced from Ref. 15 by 
permission of Elsevier Science.) 


popular technique for the purification of com¬ 
binatorial libraries on the milligram scale be¬ 
cause of high throughput and the ease of au¬ 
tomation. Typically during semi-preparative 
HPLC, fraction collection is initiated when¬ 
ever a UV signal is observed above a predeter¬ 
mined threshold. This procedure usually re¬ 
sults in the collection of several fractions per 
analysis and hence creates additional issues 
such as the need for large fraction collector 
beds and the need for secondary analysis using 
flow-injection mass spectrometry, LC-MS, or 
LC-MS-MS to identify the appropriate frac¬ 
tions. When purification of large numbers of 
combinatorial libraries is required, this ap¬ 
proach can become prohibitively time consum¬ 
ing and expensive. 

To enhance the efficiently of this purifica¬ 
tion procedure, the steps of HPLC purification 
and mass spectrometric analysis may be com¬ 
bined into automated mass-directed fraction¬ 


ation (15-17). Any size HPLC column may be 
used, and only a small fraction of the eluant 
(~|LtL/min) is diverted to the mass spectrome¬ 
ter equipped for APCI or electrospray ioniza¬ 
tion. Because all of the components, including 
autosampler, injector, HPLC, switching valve, 
mass spectrometer, and fraction collector, are 
controlled by computer, the procedure may be 
fully automated. For greatest efficiency, the 
system may be programmed to collect only 
those peaks displaying the desired molecular 
ions, or alternatively, all peaks displaying 
abundant ions within a specified mass range. 
An example of the MS-guided purification of a 
compound synthesized during the parallel 
synthesis of a combinatorial library of discrete 
compounds is shown in Fig. 13.5. Although the 
crude yield of the reaction product was only 
30% (Fig. 13.5a), the desired product was de¬ 
tected based on its molecular ion (Fig. 13.5b). 
After MS-guided fractionation, re-analysis us- 
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ing LC-MS showed that the desired product 
was >90% pure (Fig. 13.5c). 

The use of MS-guided purification of com¬ 
binatorial libraries provides a means for re¬ 
ducing the number of HPLC fractions col¬ 
lected per sample and eliminates the need for 
post-purification analysis to further charac¬ 
terize and identify each compound as would be 
necessary when using UV-basedfractionation. 
The ionization technique (i.e., electrospray, 
APCI, or APPI), and ionization mode (positive 
or negative) must be suitable for the combina¬ 
torial compound so that molecular ion species 
are formed. Also, a suitable mobile phase and 
HPLC column must be selected. As an alter¬ 
native to HPLC, supercritical fluid chroma¬ 
tography-mass spectrometry (SFC-MS) has 
been used for the high-throughput analysis of 
combinatorial libraries (18, 19). The advan¬ 
tages of SFC-MS relative to conventional 
LC-MS for the purification of combinatorial 
libraries of compounds are the lower viscosi¬ 
ties and higher diffusivities of condensed CO, 
compared with HPLC mobile phases and the 
ease of solvent removal and disposal after 
analysis. However, SFC instrumentation re¬ 
mains more expensive and less widely avail¬ 
able than conventional HPLC systems. 

2.2 Confirmation of Structure and Purity of 
Combinatorial Compounds 

The determination of molecular weights, ele¬ 
mental compositions, and structures of com¬ 
pounds used for high-throughput screening, 
whether discrete compounds or combinatorial 
library mixtures, is typically carried out using 
mass spectrometry, because traditional spec¬ 
troscopic and gravimetric techniques are too 
slow to keep pace with combinatorial chemical 
synthesis. In addition, mass spectrometry may 
be used to assess the purity of compounds be¬ 
ing used for high-throughput screening. The 
highest-throughput technique for confirming 
molecular weights and structures of drug can¬ 
didates is flow injection analysis of sample so¬ 
lutions using electrospray, APCI, or APPI 
mass spectrometry. Typically, no sample prep¬ 
aration is necessary. 

Although any organic mass spectrometer 
may be used to confirm the molecular weight 
of a compound, tandem mass spectrometers 


provide additional structural information 
through the use of CID to produce fragment 
ions. As discussed above (see also Table 13.1), 
tandem mass spectrometers include triple 
quadrupole instruments, QqTOF mass spec¬ 
trometers, ion trap mass spectrometers, mul¬ 
tiple sector magnetic sector instruments, 
FTICR instruments, and the new TOF-TOF 
mass spectrometers. In most applications, 
APCI or electrospray ionization is used. 

In addition to molecular weight and frag¬ 
mentation patterns, high precision and high 
resolution mass spectrometers such as 
QqTOF instruments, reflectron TOF mass 
spectrometers, double focusing magnetic sec¬ 
tor mass spectrometers, and FTICR instru¬ 
ments are necessary for the measurement of 
exact masses of drugs and drug candidates for 
the determination of elemental compositions. 
The combination of high resolution and high 
precision is especially useful for determining 
the elemental compositions of compounds in 
combinatorial library mixtures without hav¬ 
ing to isolate each compound using chroma¬ 
tography or some other separation technique. 
Because FTICR instruments and the hybrid 
QqTOF mass spectrometers are capable of si¬ 
multaneously measuring exact masses at high 
resolution of both molecular ions and frag¬ 
ment ions generated during MS-MS, these.in- 
struments are becoming extremely popular 
within drug discovery programs. 

As an example of the exact mass measure¬ 
ment of a combinatorial library mixture, the 
FTICR negative ion electrospray mass spectra 
of a 36- and a 120-compound peptide library 
mixture are shown in Fig. 13.6. The resolution 
achieved in this experiment was 20,000- 
40,000. Although the exact masses of all com¬ 
ponents in a small combinatorial library can 
often be measured during a single infusion ex¬ 
periment, on-line HPLC separation or the 
analysis of discrete compounds is sometimes 
required to overcome ion suppression prob¬ 
lems . However, LC-MS is a relatively slow pro¬ 
cess because of the slow chromatographic sep¬ 
aration step. Because LC-MS is required in 
many instances for the analysis of mixtures 
and to eliminate interfering salts or buffers, 
two approaches have emerged to increase the 
throughput of this technique; parallel LC-MS 
and fast LC-MS. One approach to increasing 
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Figure 13.6. (a) Partial negative ion electrospray mass spectrum of a 36-component library mix¬ 
ture. Both the measured mass and the difference between the measured and theoretical values (in 
ppm) are shown, (b) Negative ion electrospray spectrum of the 120-component library showing the 
resolution of three nominally isobaric peaks. (Reproducedfrom Ref. 24 by permission of Bentham 
Science Publishers). 


throughput of the rate-limiting chromato¬ 
graphic separation has been to simultaneously 
interface multiple HPLC columns to a single 
mass spectrometer. This approach is called 
parallel LC-MS. Commercial parallel electro¬ 


spray interfaces and HPLC systems are now 
available that can accommodate up to eight 
HPLC columns simultaneously (20-22). Al¬ 
though the multiple sprays are introduced to 
the ion source simultaneously, these streams 
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may be sampled in a time-dependent manner 
to minimize cross contamination between 
channels. 

Another solution to increasing the 
throughput of LC-MS has been to minimize 
the time required for HPLC separation 
through an approach called fast HPLC. HPLC 
separations are accelerated by using shorter 
columns and higher mobile phase flow rates. 
Because coelution of some species is likely to 
occur during fast chromatographic separa¬ 
tions, the selectivity of the mass spectrometer 
is essential for the characterization and/or 
quantitative analysis of the target compound. 
However, samples of compounds prepared us¬ 
ing combinatorial chemistry are usually sim¬ 
ple mixtures of reagents, by-products, and 
products that require only partial chromato¬ 
graphic purification to prevent ion suppres¬ 
sion effects during mass spectrometric analy¬ 
sis. 

In addition to molecular weight determina¬ 
tion using conventional MS or high exact mass 
measurement and structural confirmation us¬ 
ing MS-MS, fast LC-MS is also used to assess 
the purity and yield of combinatorial products 
(15, 23). Before high-throughput screening, 
many researchers analyze combinatorial li¬ 
braries for both purity and structural identity 
using mass spectrometry to assure the validity 
of structure-activity relationships that might 
be derived from the screening data. Fast 
LC-MS and LC-MS-MS may be carried out to 
satisfy this requirement using gradients (usu¬ 
ally a step gradient with a reverse phase 
HPLC column) with a total cycle time of 1-3 
min (24) or using an isocratic system requiring 
less than 1 min per analysis. A variety of 
HPLC columns are used for fast LC-MS that 
include narrow bore (2-mm) and analytical 
bore (4.6-mm) columns with length typically 
from 0.5-5 cm. The mobile phase flow rate for 
these fast LC-MS analyses is usually from 
1.5-5 mL/min. 

2.3 Encoding and Identification of 
Compounds in Combinatorial Libraries 
and Natural Product Extracts 

The use of mass spectrometric identification 
in combinatorial chemistry is not limited to 
the analysis of synthetic products as a means 
of quality control, but also for the identifica¬ 


tion of active compounds or "hits" during 
high-throughput screening. Although the syn¬ 
thesis and screening of discrete compounds 
(25) enables them to be followed through the 
entire process by using partial encoding or 
bar-coding, it is sometimes advantageous to 
screen libraries prepared as mixtures (26) and 
use a technique such as mass spectrometry to 
rapidly identify the hit(s) in the mixture. One 
approach to the rapid deconvolution of combi¬ 
natorial library mixtures is to prepare librar¬ 
ies containing compounds of unique molecular 
weight and then identify them using mass 
spectrometry. However, such libraries are 
necessarily small because the molecular 
weight of most drug-like molecules is between 
150-400 Da. Because of the molecular weight 
degeneracy of larger combinatorial libraries, 
several encoding strategies have been devised 
to rapidly identify active compounds in these 
mixtures (27-29). 

Because most combinatorial libraries con¬ 
tain compounds with degenerate molecular 
weights, various tagging strategies have been 
devised to uniquely identify library com¬ 
pounds bound to beads. Most of these tagging 
approaches are based on the synthesis of en¬ 
coding molecules. For example, peptide (30) or 
oligonucleotide (31) labels have been synthe¬ 
sized on the beads in parallel to the target mol¬ 
ecules and then sequenced for bead decoding. 
Alternatively, haloarene tags have been incor¬ 
porated during synthesis and then identified 
with high sensitivity using electron-capture 
gas chromatography detection (32). In addi¬ 
tion to the increased time and cost for the syn¬ 
thesis of a library containing tagging moieties, 
the tagging groups themselves might interfere 
with screening giving false positive or nega¬ 
tive results. 

For peptide libraries, one solution to this 
problem uses matrix-assisted laser desorption 
ionization (MALDI) mass spectrometry to di¬ 
rectly desorb and identify peptides from beads 
that were screened and found to be hits (33). 
This technique is called the termination syn¬ 
thesis approach. Because the peptide library 
compounds are analyzed directly, products 
with amino acid deletions or substitutions, 
side-reaction products, or incomplete depro¬ 
tection are readily observed. Also, because 
there are no extra molecules used for chemical 
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tagging, this source of interference is avoided. 
However, this approach is specific to peptide 
libraries and is not necessarily applicable to 
other types of combinatorial libraries. 

Another approach that eliminates possible 
interference from the chemical tags, "ratio en¬ 
coding," has been developed for the mass spec- 
trometric identification of bioactive leads us¬ 
ing stable isotopes incorporated into the 
library compounds (29, 34). Within the ligand 
itself, the code might be a single-labeled atom 
that is conveniently inserted whenever a com- 
mcn reagent transfers at least one atom to the 
target compound or ligand. The code consists 
of an isotopic mixture having one of the many 
predetermined ratios of stable isotopes and 
can be incorporated in the linker or added 
through a reagent used during the synthesis. 
The mass spectrum of the compound shows a 
molecular ion with a unique isotope ratio that 
codes for a particular library compound. For 
example, Wagner et al. (29) used isotope ratio 
encoding during the synthesis of a 1000-com¬ 
pound peptoid library and was able to identify 
uniquely all the components based on their 
isotopic patterns and molecular weights. Be¬ 
cause isotope ratio codes are contained within 
each combinatorial compound, a chemical tag 
is not required. The speed of MS-based decod¬ 
ing outperforms most other decoding technol¬ 
ogies, which are time consuming and decode a 
restricted set of active compounds. 

Although combinatorial synthesis provides 
rapid access to large numbers of compounds 
for screening during drug discovery and lead 
optimization, these libraries are usually based 
on a small number of common structures or 
scaffolds. There is a constant need for increas¬ 
ing the molecular diversity of combinatorial 
libraries and finding new scaffolds, and natu¬ 
ral products have always been a rich source of 
chemical diversity for drug discovery. The tra¬ 
ditional approach to screening natural prod¬ 
ucts for drug leads uses bioassays to test or¬ 
ganic solvent extracts for activity. If strong 
activity is detected, then activity-guided frac¬ 
tionation of the crude extract is used to isolate 
the active compound(s), which is identified us¬ 
ing mass spectrometry (including tandem 
mass spectrometry and exact mass measure¬ 
ments), IR, UWVIS spectrometry, and NMR. 
Recently, a variety of mass spectrometry- 
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based affinity screening methods have been 
developed to streamline the tedious process 
of activity-guided fractionation. These ap¬ 
proaches are discussed in Section 2.4. 

Whether lead compounds in natural prod¬ 
uct extracts are isolated using bioassay-guided 
fractionation or mass spectrometry-based 
screening, there is a high probability that the 
structure of the active compound(s) has al¬ 
ready been reported in the natural product lit¬ 
erature. In such cases, the tedious process of 
complete structure elucidation using a battery 
of spectrometric tools should be unnecessary. 
Instead, mass spectrometry alone may be used 
to quickly "dereplicate" or identify the known 
compounds based on molecular weight, frag¬ 
mentation patterns, and elemental composi¬ 
tion in combination with natural product da¬ 
tabase searching (35-39). Commercially 
available natural products databases include 
N APR ALE RT (40), Scientific & Technical In¬ 
formation Network (STN) (41), and the Dic¬ 
tionary of Natural Products (42). Because 
some of these databases also contain UV/VIS 
absorbance data, it is also advantageous to use 
a photodiode array detector between the 
HPLC and mass spectrometer to obtain addi¬ 
tional spectrometric data during LC-W-MS 
dereplication (36,37). 

2.4 Mass Spectrometry-Based Screening 

The earliest approaches to combinatorial syn¬ 
thesis used portioning and mixing (26)and en¬ 
abled the synthesis of combinatorial libraries 
containing hundreds of thousands to millions 
of compounds. Today, this approach remains 
the most efficient method for preparing enor¬ 
mous libraries of compounds. However, until 
the mid-1990s, efficient screening techniques 
did not exist to rapidly identify the "hits" 
within large combinatorial mixtures. There¬ 
fore, chemists were motivated to develop ways 
to prepare large numbers of discreet com¬ 
pounds using massively parallel synthesis, 
which could be assayed quickly for pharmaco¬ 
logical activity using high throughput screen¬ 
ing one compound at a time. Recently, several 
mass spectrometry-based screening assays 
have been developed that are suitable for 
screening combinatorial library mixtures, and 
some are even useful for screening natural 
product extracts which have always been a 
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Figure 13.7. Affinity chromatography 
combined with LC-MS-MS for screening 
combinatorial library mixtures. 
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source of molecular diversity for drug discov¬ 
ery. All of the mass spectrometry-based 
screening methods use receptor binding of li¬ 
gands as the basis for identification of lead 
compounds. 

2.4.1 Affinity Chromatography-Mass Spec¬ 
trometry. Since the introduction of affinity 
chromatography more than 30 years ago, this 
technique has become a standard biochemical 
tool for the isolation and identification of new 
binding partners to specific target molecules. 
Therefore, the coupling of affinity chromatog¬ 
raphy to mass spectrometry is a logical exten¬ 
sion of this technique, and the application of 
affinity LC-MS to the screening of combinato¬ 
rial libraries has been demonstrated by sev¬ 
eral groups (43, 44). During affinity LC-MS 
screening, a receptor molecule such as a bind¬ 
ing protein or enzyme is immobilized on a 
solid support within a chromatography col¬ 
umn. The library mixture is pumped through 
the affinity column in a suitable binding 
buffer so that any ligands in the mixture with 
affinity for the receptor would be able to bind. 
Then, unbound material is washed away. Fi¬ 
nally, the specifically bound ligands are eluted 
using a destabilizing mobile phase and identi¬ 
fied using mass spectrometry. This affinity- 
column LC-MS assay is summarized in Fig. 
13.7. 


In some applications (43), ligands are 
eluted from the affinity column and then 
trapped on a second column such as a reverse 
phase HPLC column. LC-MS or LC-MS-MS 
identification of the ligands (hits) is then car¬ 
ried out using the trapping column. In other 
systems, ligands are identified directly from 
the affinity column using mass spectrometry 
(44). For example, Kelly et al. (44) prepared an 
affinity column containing immobilized phos- 
phatidylinositol-3-kinaseand used it for direct 
LC-MS screening of a 361-component peptide 
library. Electrospray mass spectrometry and 
tandem mass spectrometry were used to iden¬ 
tify the ligands released from the affinity col¬ 
umn using pH gradient elution. 

Advantages of affinity chromatography- 
mass spectrometry for screening during drug 
discovery include versatility and re-use of the 
column. Both combinatorial libraries and nat¬ 
ural product extracts can be screened using 
this approach, and a wide range of binding 
buffers may be used. Mass spectrometry-com¬ 
patible mobile phases are only required during 
the final LC-MS detection step. Furthermore, 
a single column may be used multiple times to 
screen different samples for ligands unless the 
destabilization solution irreversibly dena¬ 
tures, releases, or inhibits the receptor. 

Despite these advantages, affinity chroma¬ 
tography has numerous drawbacks that have 



2 Current Trends and Recent Developments 


599 


0 O 
0 L o 0 
0 0 ° 


Binding O O 

0 n 0 

+ R -► L-R + O 

O o 0 


GPC isolation 


i r 


identification 


L-R 


i 


Reversed phase 

desalting/denaturation 


<r 


L 


m/z 


Figure 13.8. GPC followed by LC-MS-MS 
for screening mixtures of combinatorial li¬ 
braries. After incubation of a receptor with a 
library of compounds, the ligand-receptor 
complexes (L-R) are separated from the low 
molecular weight unbound library com¬ 
pounds using GPC. Next, the L-R complexes 
are denatured during reversed phase HPLC 
to release the ligands for MS-MS identifica¬ 
tion. 


prompted the development of alternative mass 
spectrometer screening tools. For example, im¬ 
mobilization of the receptor might change its af¬ 
finity characteristics causing false negative or 
false positive hits. This is particularly problem¬ 
atic for receptors that are solution-phase in their 
native state. Also, developing and then imple¬ 
menting an immobilization scheme is often a 
slow, tedious, and even expensive process, and 
this process is unique for each new receptor. Fi¬ 
nally, false positive hits are often obtained when 
screening large molecularly diverse libraries, be¬ 
cause there are usually compounds in such mix¬ 
tures that have affinity for the stationary phase 
or linker molecule instead of the receptor. 

2.4.2 Gel Permeation Chromatography- 
Mass Spectrometry. Another type of chroma¬ 
tography that has been combined with mass 
spectrometry as a screening system for drug 
discovery is gel permeation chromatography 
(GPC) (45,46). Also called size-exclusion chro¬ 
matography, GPC separates molecules accord¬ 
ing to size as they pass through a stationary 
phase containing particles with a defined pore 
size. During GPC-based screening, a library 
mixture is pre-incubated with a macromolec- 
ular receptor to allow any ligands in the li¬ 
brary to bind, and then GPC is used to sepa¬ 
rate the large receptor-ligand complexes from 
the unbound low molecular weight com¬ 
pounds in the mixture. Finally, ligands are re¬ 
leased from the receptor during reversed 


phase HPLC and identified either on-line or 
off-line using tandem mass spectrometry. 
This screening method is illustrated in Fig. 
13.8. 

During the pre-incubation and GPC steps, 
any binding buffer may be used, because the 
binding buffer will be removed during reverse 
phase LC-MS analysis. However, the GPC sep¬ 
aration step must be carried out quickly, be¬ 
cause ligands begin to dissociate from the re¬ 
ceptor immediately and can become lost into 
the size exclusion gel. Despite this disadvan¬ 
tage, this approach allows both receptor and 
ligand to be screened in solution, which avoids 
some of the problems associated with the use 
of affinity columns for screening. The GPC 
LC-MS-MS screening method should also be 
suitable for screening natural product ex¬ 
tracts as well as combinatorial library mix¬ 
tures. 

2.4.3 Affinity Capillary Electrophoresis- 
Mass Spectrometry. Affinity capillary electro¬ 
phoresis was originally used for the determi¬ 
nation of the binding constants of small 
molecules to proteins (47-49). This solution- 
based technique is rapid and requires only 
small amounts of ligands. Affinity constants 
are measured based on the mobility change of 
the ligand on interaction with the receptor 
present in the electrophoretic buffer (50). By 
combining affinity capillary electrophoresis 
with on-line mass spectrometric detection and 
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Figure 13.9. Affinity capil¬ 
lary electrophoresis-UV-mass 
spectrometry of a 100-tetrapep- 
tide library screened for binding 
to vancomycin (104 pAf in the 
electrophoresis buffer), (a) The 
elution of peptides was moni¬ 
tored with UV absorbance dur¬ 
ing capillary electrophoresis, 
and the elution time increased 
with increasing affinity for van¬ 
comycin. (b) Positive ion electro¬ 
spray mass spectrum with CID 
of the Tris adduct of the proton- 
ated peptide detected at —5 min 
in the electropherogram shown 
in a (Reproduced from Ref. 52 
by permission of the American 
Chemical Society.) 



(b) 

100 - 

g 

80- 

& 


c 

0) 

60 - 

c 


<D 


1 

40 - 

CD 

CC 

20 - 


[Tris+H] 


Fmoc-DDFA 


[M+Tris+H]' 1 


[M+H]- 1 



200 


400 


600 


800 


m/z 


identification, affinity constants for multiple 
compounds can be measured in a single anal¬ 
ysis (51). Recognizing that on-line mass spec- 
trometric detection was helpful for the identi¬ 
fication of each ligand, Chuet al. (52) extended 
this approach to include the screening of com¬ 
binatorial libraries as a means of drug discov¬ 
ery. The data in Fig. 13.9 show the results of 
screening a 100-tetrapeptide library for affin¬ 
ity to vancomycin using affinity capillary elec¬ 
trophoresis-mass spectrometry. Without van¬ 
comycin in the electrophoresis buffer, all the 
peptides eluted within 3 min. When vancomy¬ 
cin was present, the peptides eluted in order of 
affinity, with the highest affinity compounds 
being detected between 4.5 and 5 min. Positive 
ion electrospray tandem mass spectrometry 
was used to identify the highest affinity li¬ 
gands (see Fig. 13.9b). 

Note that some peptide ligands such as 
Fmoc-DDFA were detected as adducts with 


Tris, which was used in the electrophoresis 
buffer. Although the identification of this pep¬ 
tide was not prevented by the formation of this 
adduct, some buffers used during electro¬ 
phoresis might interfere with mass spectro- 
metric ionization and detection. Also, the 
types of libraries that have been screened us¬ 
ing this approach have contained modest 
numbers of synthetic analogs such as pep¬ 
tides. Libraries exceeding 400 members re¬ 
quired preliminary purification using affinity 
chromatography to reduce the number of com¬ 
pounds (52). As a result, this approach is prob¬ 
ably not ideal for screening libraries contain¬ 
ing molecularly diverse compounds or for 
screening natural product extracts. However, 
affinity capillary electrophoresis-mass spec¬ 
trometry is fast; each analysis requires less 
than 10 min. Also, it may be used to measure 
affinity constants for ligand-receptor interac¬ 
tions. 
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2.4.4 Frontal Affinity Chromatography- 
Mass Spectrometry. Like affinity chromatog¬ 
raphy-mass spectrometric screening (see Sec¬ 
tion 2.4.1), frontal affinity chromatography 
uses an affinity column containing immobi¬ 
lized receptor molecules (53). The difference 
between the two screening methods is that the 
ligands are continuously infused into the col¬ 
umn during frontal affinity chromatography 
and detected using mass spectrometry. Com¬ 
pounds with no affinity for the immobilized 
receptor elute immediately in the void volume, 
but the elution of the ligands is delayed. As 
compounds compete for binding sites on the 
affinity column, these sites become saturated 
until ligands begin to elute from the column at 
their infusion concentration. In this manner, 
frontal affinity chromatography may be used 
to measure affinity constants for ligands, and 
ty using a mass spectrometer for on-line iden¬ 
tification of ligands, this technique becomes a 
screening method (54, 55). 

During frontal affinity chromatography- 
mass spectrometry, signals for all compounds 
eluting from the affinity column are recorded 
ty the mass spectrometer, and the last com¬ 
pounds to elute at their infusion concentra¬ 
tions represent the highest affinity com¬ 
pounds or "hits." An example of the screening 
cf six oligosaccharides with different binding 
affinities for an immobilized monoclonal car¬ 
bohydrate-binding antibody is shown in Fig. 
13.10. Compounds 1-3 eluted immediately (no 
affinity), whereas compounds 4-6 eluted in 
order of increasing affinity for the antibody. 
Dissociation constants were determined to be 
185,12.6, and 1.8 ijlM for compounds 4-6, re¬ 
spectively (54). 

Because frontal affinity chromatography 
uses a conventional affinity column, this tech¬ 
nique provides additional applications of this 
type of column to investigators already using 
affinity-mass spectrometry (See Section 
2.4.1). However, the same limitations and dis¬ 
advantages of using immobilized receptors 
still apply, such as non-specific binding to the 
stationary phase, the development time and 
cost of preparing the affinity columns, and the 
possibility that immobilizing the receptor 
might alter its binding characteristics and 
specificity. In addition, mass spectrometric de¬ 
tection creates some additional limitations. 
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Because all library compounds must be moni¬ 
tored simultaneously, the compounds must be 
selected so that they have unique molecular 
weights. Also, one compound in the mixture 
should not suppress the ionization of another. 
Therefore, this approach is probably re¬ 
stricted to the screening of small combinato¬ 
rial libraries that are similar in chemical 
structure and ionization efficiencies. Finally, 
the binding buffer used for affinity chromatog¬ 
raphy must be compatible with on-line APCI 
or electrospray mass spectrometry. This 
means that the mobile phase must be volatile 
and usually of low ionic strength (i.e., typically 
<40 mM for electrospray ionization). 

2.4.5 Bioaffinity Screening using Electro¬ 
spray FTICR Mass Spectrometry. Although 
FTICR mass spectrometry may be used to de¬ 
termine the exact masses of combinatorial li¬ 
brary compounds and to confirm their struc¬ 
tures using CID and high resolution tandem 
mass spectrometry (see definitions of CID and 
MS-MS in Section 1), electrospray FTICR 
mass spectrometry may be used for the direct 
screening of combinatorial libraries without 
the need for any pre-purification or chroma¬ 
tography. In this application, a combinatorial 
library is pre-incubated with a receptor in so- . 
lution and then analyzed directly using elec¬ 
trospray to identify receptor-ligand com¬ 
plexes in the gas phase (56-60). Once a 
receptor-ligand complex is ionized and 
trapped in the FTICR mass spectrometer, the 
mass difference between the complex and the 
receptor alone might be measured with suffi¬ 
cient resolution and accuracy to determine the 
mass(es) and perhaps elemental composi¬ 
tion ~()f the ligand(s). If the ligand carries a 
charge, then CID may be used to dissociate the 
ligand for subsequent analysis using tandem 
mass spectrometry. This elegant and simple 
screening approach is summarized in Fig. 
13.11. 

An extension of this FTICR mass spectrom¬ 
etry-based screening technique has been to 
screen a combinatorial library for ligands to 
two receptors simultaneously (59, 60). In this 
example, the two receptors consisting of RNA 
constructs representing the prokaryotic (16s) 
rRNA and eukaryotic (18s) rRNA A-site were 
incubated simultaneously with an aminogly- 
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Figure 13.10. Frontal affinity chromatography-mass spectrometry screening of a 6-oligosaccharide 
mixture for affinity to an immobilized carbohydrate-binding monoclonal antibody. Top: positive ion 
electrospray total ion chromatogram. Middle: computer-reconstructed mass chromatograms for the 
molecular ion species of all six compounds. Compounds 1-3 (solid line) eluted in the void volume, 
indicating no binding to the antibody. Break through signals for compounds 4, 5, and 6 appear at 
successively later times, indicatingincreasing affinity for the immobilized antibody. Bottom: positive 
ion electrospray mass spectra recorded at times I, II, and III as indicated in the middle trace. The 
protonated molecules of compounds 1-6 are labeled. 


coside library to identify potential ligands. By 
screening a target mixture against the same 
library, screening efficiency is enhanced and 
the number of analyses required is reduced. 

The advantage of this screening method 
over other approaches is the elimination of pu¬ 
rification steps before mass spectrometric 
identification. Also, the disadvantages associ¬ 
ated with chromatographic separations are 
eliminated. However, the use FTICR mass 
spectrometric screening restricts the binding 


buffer and receptors that may be used. Only 
low ionic strength and volatile buffers are 
compatible with this approach (such as 10 mM 
ammonium acetate). Also, the receptor and li¬ 
gand must be highly purified to avoid impuri¬ 
ties that might interfere with ionization and 
detection. Therefore, this technique is proba¬ 
bly more suitable for the screening of combi¬ 
natorial libraries than complex natural prod¬ 
uct mixtures. Finally, the receptor-ligand 
complex must ionize efficiently during electro- 
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Figure 13.11. Bioaffinity electrospray 
FTICR mass spectrometry. The isolation and 
mass spectrometricidentificationcf receptor- 
specific ligands are carried out entirely in the 
mass spectrometer without chromatography 
or other separation steps. 


spray under solvent and ion source conditions 
that do not cause dissociation of the complex. 

2.4.6 Pulsed Ultrafiltration-Mass Spectrom¬ 
etry. A versatile approach to screening solu¬ 
tion phase combinatorial libraries and natural 
product extracts is pulsed ultrafiltration- 
mass spectrometry (61, 62), which uses a stan¬ 
dard LC-MS system with an ultrafiltration 
chamber substituted for the HPLC column. 


The principle of pulsed ultrafiltration screen¬ 
ing of combinatorial libraries is shown in Fig. 
13.12. During pulsed ultrafiltration, ligand- 
receptor complexes remain in solution in the 
ultrafiltration chamber while unbound library 
compounds and buffer are washed away. After 
unbound compounds are removed, the hits 
from the library are eluted from the chamber 
by destabilizing the ligand-receptor complex 
using an organic solvent, a pH change, or a 
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Figure 13.12. Combinatoriallibrary screeningusing pulsed ultrafiltration mass spectrometry. Dur¬ 
ing the loading step (left), ligands are bound to the receptor either on-line (top) using a flow-through 
approach or off-line (bottom two incubations). Unbound compounds and binding buffer, cofactors, 
etc. are washed out of the ultrafiltration chamber to waste during a separation step (middle).Bound 
ligands are dissociated from the receptor molecules and eluted from the chamber by introducing a 
destabilizing solution such as methanol, pH change, etc. Finally, released ligands are identified using 
mass spectrometry, tandem mass spectrometry, or LC-MS (right). (Reproduced from Ref. 64 by 
permission cf John Wiley & Sons.) 
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Figure 13.13. Identification of EHNA as the highest affinity ligand for adenosine deaminase in a 
combinatorial library of 20 adenosine analogs using ultrafiltration electrospray mass spectrometry. 
(Reproduced from Ref. 61 by permission of the American Chemical Society.) 


combination of both. The released ligands are 
identified on-line using APCI or electrospray 
mass spectrometry (61) or collected and ana¬ 
lyzed off-line using mass spectrometry, LC- 
MS, or LC-MS-MS (63). 

An example of pulsed ultrafiltration mass 
spectrometry for the screening of a library of 
20 adenosine analogs for ligands to adenosine 
deaminase is shown in Fig. 13.13. After a 15- 
min preincubation of the library compounds 
(17.5 pM each except for EHNA, which was 
present at 1.75 jllM) with 2.1 yM adenosine 
deaminase in 50 m M phosphate buffer, an al¬ 
iquot containing 420 pmol of the receptor was 
injected into the ultrafiltration and washed for 
8 min at 50 ju,L/min with water to remove the 
phosphate buffer and unbound or weakly 
binding library compounds. Methanol was in¬ 
troduced into the mobile phase to dissociate 
the enzyme-ligand complex and release bound 
ligands for identification by electrospray mass 
spectrometry. During methanol elution, only 
EHNA [erythro-9-(2-hydroxy-3-nonyl) ade¬ 
nine] was detected as the [M+H] + ion of mtz 
278 (Fig. 13.13). In control experiments using 
the library without enzyme, no library com¬ 
pounds were detected during methanol elu¬ 


tion (Fig. 13.13, Control). Despite being 
present at a 10-fold lower concentration than 
the natural substrate adenosine analogs, 
EHNA was easily identified because it had the 
highest affinity among the library compounds 
(. K d = 1.9 n M). This demonstrates the use cf 
ultrafiltration electrospray mass spectrome¬ 
try for identifying a high affinity ligand among 
a set of analogs that bind to a specific receptor. 
In a follow-up lead optimization study using 
pulsed ultrafiltration mass spectrometry, a 
synthetic combinatorial library of EHNA ana¬ 
logs was screened for binding to adenosine 
deaminase, and structure-activity relation¬ 
ships for EHNA binding were identified (65). 

As an illustration of the versatility of 
pulsed ultrafiltration-mass spectrometry, 
binding assays for a variety of receptors have 
been reported including dihydrofolate reduc¬ 
tase (63), cyclooxygenase-2 (62), serum albu¬ 
min (66, 67) and estrogen receptors (68). Not 
only is pulsed ultrafiltration useful for identi¬ 
fying ligands to different receptors, but a wide 
range of combinatorial libraries and natural 
product extracts in any suitable binding buffer 
may be screened. In addition to combinatorial 
libraries, complex natural product extracts 
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have been screened (68), and neither plant nor 
fermentation broth matrices were found to in¬ 
terfere with screening (62). As another exam¬ 
ple of the flexibility of this screening system, a 
centrifuge tube equipped with an ultrafiltra¬ 
tion membrane (69) has been used instead of 
an on-line ultrafiltration chamber. Other ap¬ 
plications of pulsed ultrafiltration-mass spec¬ 
trometry include screening drugs and drug 
candidates for metabolic stability (70), meta¬ 
bolic activation to reactive metabolites (71), 
and the measurement of affinity constants for 
ligand-receptor interactions (66,67). 

Metabolism and toxicity screening appli¬ 
cations of pulsed ultrafiltration use hepatic 
microsomes in the ultrafiltration chamber. 
For metabolic screening drugs and the co¬ 
factor nicotinamide dinucleotide phosphate 
(NADPH) are flow-injected through the ul¬ 
trafiltration chamber (oxygen is dissolved in 
the mobile phase), and the metabolites 
formed by microsomal cytochrome P450 and 
any unreacted compounds flow out of the 
chamber for mass spectrometric identifica¬ 
tion and/or quantitative analysis (70). On¬ 
line applications require the use of volatile 
buffers, but LC-MS and LC-MS-MS may be 
used off-line to analyze the ultrafiltrate no 
matter what buffer had been used. Screen¬ 
ing drugs for metabolic activation using 
pulsed ultrafiltration-mass spectrometry is 
carried out in a similar manner, except that 
glutathione is coinjected along with NADPH 
and the drug substrate (71). MS-MS may be 
used on-line or LC-MS-MS may be used off¬ 
line to screen for glutathione adducts as an 
indication that the drug was metabolized to 
a reactive intermediate(s) that was trapped 
by reaction with glutathione. Finally, pulsed 
ultrafiltration may be used with UV or mass 
spectrometric detection to measure affinity 
constants of individual compounds (66). 

To measure affinity constants and other 
physico-chemicalproperties of binding such as 
the number of binding sites, two pulsed ultra¬ 
filtration measurements are carried out. First, 
an aliquot or pulse of a liquid is injected 
through the chamber, and the elution profile 
is recorded. Then, the chamber is loaded with 
a receptor, and the ligand is reinjected. If bind¬ 
ing occurs, the elution profile will be delayed 
in proportion to the affinity constant. The con- 
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trol injection is used to control for non-specific 
binding to the apparatus. Because the concen¬ 
tration of receptor and total amount of liquid 
are known, and because the concentration of 
free ligand is measured as it elutes from the 
chamber over a wide range of concentrations, 
the affinity constant and other binding param¬ 
eters may be calculated. 

In most of the applications of pulsed ultra¬ 
filtration to date, serial analyses were carried 
out with a throughput of approximately one or 
two assays per hour. Because the purpose of 
these assays was to screen complex mixtures 
or to obtain metabolism data for new drug en¬ 
tities, the throughput of these analyses was 
acceptable, but was not high throughput. The 
rate limiting step in these analyses was the 
ultrafiltration separation and not the mass 
spectrometric detection. Two solutions have 
been reported to increase the throughput of 
pulsed ultrafiltration mass spectrometry. In 
the first solution, van Breemen et al. (70) used 
a multiplex ultrafiltration system in which up 
to 60 ultrafiltration chambers could be ar¬ 
ranged in parallel and interfaced to a single 
mass spectrometer. This scheme is shown in 
Fig. 13.14. In this system, a continuous flow of 
the buffer or mobile phase is maintained 
through the ultrafiltration chambers, but the 
mass spectrometer samples each ultrafiltrate • 
solution at 1-minintervals. The sampling time 
would be selected to correspond to the time at 
which a maximum concentration of metabo¬ 
lites would be expected to elute from the 
chamber. This approach was demonstrated to 
increase the throughput of metabolic screen¬ 
ing using ultrafiltration mass spectrometry by 
60-fold. Although used originally for meta¬ 
bolic screening, this approach would be appli¬ 
cable to toxicity screening and drug discovery 
screening as well. 

The second solution to increasing the 
throughput of pulsed ultrafiltration mass 
spectrometry has been to miniaturize the ul¬ 
trafiltration chamber volume while maintain¬ 
ing the flow rate and chamber pressure. Be¬ 
cause the ultrafiltration membrane cannot 
withstand high pressure without rupturing, 
the ultrafiltration process cannot be acceler¬ 
ated simply by increasing the flow rate 
through the chamber. The approach of Bev¬ 
erly et al. (72) was to fabricate a 35-/U.L ultra- 
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chambers 


Figure 13.14. High-throughput pulsed ultrafiltration mass spectrometry system for screening drug 
candidates for metabolic transformation. Multiple ultrafiltration chambers are connected in parallel 
to a single mass spectrometer detector. After loading each chamber with liver microsomes, a different 
drug is injected into each chamber at intervals of 1 min (for 60 screens/h using 60 chambers). 
Constant flow cf incubation buffer is maintained through all chambers, but only one chamber at a 
time is connected on-line to the mass spectrometer. Drug metabolite profiles are recorded using mass 
spectrometry for up to 1 min per chamber. (Reproducedfrom Ref. 70 by permission of the American 
Society for Pharmacology and Experimental Therapeutics.) 


filtration chamber that was approximately 
threefold lower in volume than the smallest 
reported by van Breemen et al. (61). As a re¬ 
sult, ultrafiltration mass spectrometric analy¬ 
ses could be carried out at the rate of at least 
three per hour, which corresponded to a three¬ 
fold enhancement of throughput. This study 
suggests that chip-based ultrafiltration mass 
spectrometry would have the potential to re¬ 
sult in a truly high-throughput system. 

The advantages of pulsed ultrafiltration- 
mass spectrometry include the variety of dif¬ 
ferent applications that may be carried out, 
the convenience of on-line screening, solution- 
phase screening, the ability to screen either 
combinatorial libraries or natural product ex¬ 
tracts, the diversity of receptors that may be 
screened, and the freedom to use either vola¬ 
tile or non-volatile binding buffers. For meta¬ 
bolic and toxicity screening, flow injection 
analyses have the additional advantages that 
product feedback inhibition is prevented so 
that the metabolic profile more closely approx¬ 
imates the in vivo system (70). Finally, the 


disadvantages of pulsed ultrafiltration screen¬ 
ing for drug discovery include the washing 
step, during which dissociation and loss cf 
weakly bound ligands might occur, and the 
slow speed of each experiment, which can take 
up to 1 h. 

2.4.7 Solid Phase Mass Spectrometric 
Screening. Because drugs are usually in a sol¬ 
uble form to be transported to the active sites 
in cells and tissues, it is logical that most mass 
spectrometry-based screening methods use 
solution-phase analysis of these compounds, 
and it is no surprise that most successful mass 
spectrometry screening assays use electro¬ 
spray ionization or APCI. However, solid 
phase ionization techniques such as matrix- 
assisted laser desorption ionization (MALDI) 
might be effective, provided that ligand-re¬ 
ceptor interactions are allowed to take place 
in an environment similar to in vivo condi¬ 
tions and that a suitable separation step is 
carried out before the preparation of the 
MALDI sample. 
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To use MALDI mass spectrometry for 
screening, several research groups have devel¬ 
oped immobilized receptors on MALDI targets 
cr on solid supports that can be placed on a 
MALDI target for use in the affinity purification 
of potential drugs from test solutions. Following 
procedures originally developed for affinity 
chromatography, the preparation of affinity sur¬ 
faces for MALDI mass spectrometry has been 
achieved quite easily. However, the use of these 
affinity MALDI chips for screening mixtures of 
small molecules during drug discovery has been 
unproductive. One of the problems has been the 
high background noise at low m/z values caused 
ty the matrix used for MALDI. This problem 
may be mitigated by eliminating the matrix or 
using alternative sample stages such as porous 
silicon chips (73, 74). However, noise persists 
because of the affinity support and immobilized 
receptor molecules. Another problem that has 
yet to be overcome is the elimination of the high 
background noise caused by non-specific bind¬ 
ing of test compounds to the affinity target. Al¬ 
though this problem is similar to the false posi¬ 
tive results and non-specific binding that occurs 
during affinity chromatography-mass spec¬ 
trometry (see Section 2.4.1), the signals for non¬ 
specific binding are magnified by the fad that 
the actual affinity surface is being irradiated and 
sampled by the MALDI laser beam. As a result, 
affinity-based screening coupled with MALDI 
mass spectrometry has not been a successful 
drug discovery approach. 

However, progress is being made in the use 
cf affinity probes for the capture of proteins 
and other macromolecules from biological so¬ 
lutions followed by MALDI mass spectromet- 
ric detection and identification (75-77). One 
affinity MALDI mass spectrometry method 
has been paired with the affinity probes using 
in surface plasmon resonance systems (78). 
These affinity-based MALDI mass spectrome¬ 
try screening assays are promising approaches 
for testing blood or other biological fluids for 
the presence of specific proteins or other mac¬ 
romolecules. As a result, these have the poten¬ 
tial to become clinical diagnostic tools or 
might even lead to the identification of new 
therapeutic targets. However, they are un¬ 
likely to become useful for screening combina¬ 
torial libraries or natural product extracts for 
the purpose drug discovery. 


3 THINGS TO COME 

Mass spectrometry has become an essential 
analytical tool at every stage of the drug dis¬ 
covery and development process. In this chap¬ 
ter, the various applications of mass spectrom¬ 
etry to combinatorial chemistry and drug 
discovery have been highlighted. Although the 
speed of mass spectrometry matches the de¬ 
mands of combinatorial chemistry, the slow 
and serial nature of chromatography in the 
various LC-MS applications remains a bottle¬ 
neck that limits their throughput. Because 
mass spectrometry is highly selective, only 
partial chromatographic separations are 
needed for most measurements. In fact, the 
primary function of the chromatography step 
is usually to separate species that might oth¬ 
erwise interfere with the ionization process. 
Recognizing this limited function of chroma¬ 
tography during LC-MS-based screening as¬ 
says, manufacturers of chromatography col¬ 
umns are addressing this need by developing 
high-throughput columns for fast chromatog¬ 
raphy for LC-MS. Improvements in this direc¬ 
tion should continue to reduce the time re¬ 
quired for LC-MS from a few minutes to a few 
seconds. Meanwhile chip-based technology is 
beginning to emerge for miniaturized capil¬ 
lary electrophoresis-mass spectrometry (OE¬ 
MS) (79). These chips are being developed to 
enable ultrafast and highly sensitive electro¬ 
spray mass spectrometric analysis. Because of 
their microscopic size, CE-MS chips have the 
potential to hold large arrays of samples that 
would facilitate high-throughput analysis. 

In terms of mass spectrometry instrumen¬ 
tation, the currently available instruments 
such as time-of-flight (TOF) analyzers and hy¬ 
brid quadrupole-TOF analyzers are able to ac¬ 
quire complete mass spectra at rates compat¬ 
ible with fast CE separations. As CE or 
ultrafast chromatography replaces conven¬ 
tional, slow HPLC applications, TOF-based 
mass spectrometers will be needed to replace 
the less efficient scanning types of instru¬ 
ments such as quadrupoles and ion traps for 
most high-throughput applications. FTICR 
mass spectrometry remains unsurpassed in 
terms of resolution and mass accuracy for both 
MS and MS-MS applications. However, the 
throughput of FTICR mass spectrometric 



608 


Mass Spectrometry and Drug Discovery 


analysis needs to be increased to remain use¬ 
ful for combinatorial chemistry applications. 
Advances in increasing the throughput of 
FTICR mass spectrometry are anticipated. 

Hyphenated technologies such as LC- 
NMR-MS are being developed to support 
structure elucidation of combinatorial librar¬ 
ies (80). Although such technologies are still in 
a developmental stage, they have great poten¬ 
tial for analyses of combinatorial libraries and 
for natural product drug discovery (81-83). 
The main impediments of applying LC- 
NMR-MS to combinatorial chemistry remain 
poor sensitivity of the NMR, the obligatory use 
of deuterated solvents for chromatography, 
and the low throughput of NMR analyses. 
However, efforts are in progress to improve 
the throughput of NMR analyses (84-86). 

In conclusion, mass spectrometry provides 
rapid, reliable, sensitive, and selective analy¬ 
sis of combinatorial libraries for structure 
confirmation, purity analysis, and library de- 
convolution. In addition, mass spectrometric 
screening methods have been developed and 
are beginning to be applied to drug discovery. 
In the case of natural products, mass spec¬ 
trometry facilitates the screening of natural 
product extracts and facilitates the dereplica¬ 
tion and characterization of lead compounds. 
At different times during the last 100 years, 
first physicists and physical chemists and then 
organic chemists pronounced that mass spec¬ 
trometry had run out of new applications and 
had no future. Fortunately, they were wrong. 
Today, medicinal chemists recognize that the 
potential of mass spectrometry to contribute 
to all facets of drug discovery has only just 
begun to be explored. Furthermore, applica¬ 
tions of mass spectrometry to drug develop¬ 
ment are even less developed and are waiting 
to be developed. Mass spectrometry has be¬ 
come a fundamental analytical tool for drug 
discovery, and this role should continue to 
grow in the future. 
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Spectrometry, John Wiley and Sons, New 
York, 1997. 
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Electron Cryomicroscopyof Biological Macromolecules 


1 MACROMOLECULAR STRUCTURE 
DETERMINATION BY USE OF 
ELECTRON MICROSCOPY 

The two principal methods of macromolecular 
structure determination that use scattering 
techniques are electron microscopy and X-ray 
crystallography. The most important differ¬ 
ence between the two is that the scattering 
cross section is about 105 times greater for 
electrons than it is for X-rays, so significant 
scattering using electrons is obtained for spec¬ 
imens that are 1 to 10 nm thick, whereas scat¬ 
tering or absorption of a similar fraction of an 
illuminating X-ray beam requires crystals 
that are 100 to 500 iim thick. The second main 
difference is that electrons are much more eas¬ 
ily focused than X-rays because they are 
charged particles that can be deflected by elec¬ 
tric or magnetic fields. As a result, electron 
lenses are greatly superior to X-ray lenses and 
can be used to produce a magnified image of an 
object as easily as a diffraction pattern. This 
then allows the electron microscope to be 
switched back and forth instantly between im¬ 
aging and diffraction modes so that the image 
of a single molecule at any magnification can 
be obtained as conveniently as the electron 
diffraction pattern of a thin crystal. 

In the early years of electron microscopy of 
macromolecules, electron micrographs of mol¬ 
ecules embedded in a thin film of heavy atom 
stains (1,2) were used to produce pictures that 
were interpreted directly. Beginning with the 
work of Klug and Berger (3), a more rigorous 
approach to image analysis led first to the in¬ 
terpretation of the two-dimensional (2D) im¬ 
ages as the projected density summed along 
the direction of view and then to the ability to 
reconstruct the three-dimensional (3D) object 
from which the images arose (4, 5), with sub¬ 
sequent more sophisticated treatment of im¬ 
age contrast transfer (6). 

Later, macromolecules were examined by 
electron diffraction and imaging without the 
use of heavy atom stains by embedding the 
specimens in either a thin film of glucose (7) or 
in a thin film of rapidly frozen water (8-10), 
which required the specimen to be cooled 


while it was examined in the electron micro¬ 
scope. This use of unstained specimens thus 
led to the structure determination of the mol¬ 
ecules themselves rather than the structure of 
a "negative stain" excluding volume, and has 
created the burgeoningfield of 3D electron mi¬ 
croscopy of macromolecules. 

Many medium resolution structures of 
macromolecular assemblies (e.g., ribosomes), 
spherical and helical viruses, and larger pro¬ 
tein molecules have now been determined by 
electron cryomicroscopy in ice. Four atomic 
resolution structures have been obtained by 
electron cryomicroscopy of thin 2D crystals 
embedded in glucose, trehalose, or tannic acid 
(11-14), where specimen cooling reduced the 
effect of radiation damage. One of these, the 
structure of bacteriorhodopsin (1 l)provided 
the first structure of a seven-helix membrane 
protein. The medium resolution density distri¬ 
butions can often be interpreted in terms of 
the chemistry of the structure if a high resolu¬ 
tion model of one or more of the component 
pieces has already been obtained by X-ray, 
electron microscopy, or NMR methods. As a 
result, the use of electron microscopy is be¬ 
coming a powerful technique for which, in 
some cases, no alternative approach is possi¬ 
ble. Useful reviews [e.g., Dubochet et al. (9), 
Amos et al. (15), Walz and Grigorieff (16), and 
Baker et al. (17)] and a book [Frank (18)] have 
been written. 

2 ELECTRON SCATTERING 
AND RADIATION DAMAGE 

A schematic overview of scattering and imag¬ 
ing in the electron microscope is depicted in 
Fig. 14.1. The incident electron beam passes 
through the specimen and individual elec¬ 
trons are either unscattered or scattered by 
the atoms of the specimen. This scattering oc¬ 
curs either elastically, with no loss of energy 
and therefore no energy deposition in the 
specimen, or inelastically, with consequent 
energy loss by the scattered electron and ac¬ 
companying energy deposition in the speci¬ 
men, resulting in radiation damage. The elec¬ 
trons emerging from the specimen are then 
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collected by the imaging optics, shown here for 
simplicity as a single lens, but in practice con¬ 
sisting of a complex system of five or six lenses, 
with intermediate images being produced at 
successively higher magnification at different 
positions down the column. Finally, in the 
viewing area, either the electron diffraction 
pattern or the image can be seen directly by 
eye on the phosphor screen, or detected by a 
TV or CCD camera, or recorded on photo¬ 
graphic film or image plate. 

3 ELASTIC AND INELASTIC SCATTERING 

The coherent, elastically scattered electrons 
contain all the high resolution information 


Figure 14.1. Schematic diagram show¬ 
ing the principle of image formation and 
diffraction in the transmission electron 
microscope. The incident beam/ 0 illumi¬ 
nates the specimen. Scattered and un¬ 
scattered electrons are collected by the 
objective lens and focused back to form 
first an electron diffraction pattern and 
then an image. For a 2D or 3D crystal, 
the electron-diffraction pattern would 
show a lattice of spots, each of whose in¬ 
tensity is a small fraction of that of the 
incident beam. In practice, an in-focus 
image has no contrast, so images are re¬ 
corded with the objective lens slightly 
defocused to take advantage of the out- 
of-focus phase-contrast mechanism. 

describing the structure of the specimen. 
The amplitudes and phases of the scattered 
electron beams are directly related to the 
amplitudes and phases of the Fourier com¬ 
ponents of the atomic distribution in the 
specimen. When the scattered beams are re¬ 
combined with the unscattered beam in the 
image, they create an interference pattern 
(the image), which, for thin specimens, is 
related approximately linearly to the density 
variations in the specimen. The information 
about the structure of the specimen can then 
be retrieved by digitization and computer- 
based image processing, as described later. 
The elastic scattering cross sections for elec¬ 
trons are not as simply related to the atomic 
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composition as happens with X-rays. With 
X-ray diffraction, the scattering factors are 
simply proportional to the number of elec¬ 
trons in each atom, normally equal to the 
atomic number. Given that elastically scat¬ 
tered electrons are in effect diffracted by the 
electrical potential inside atoms, the scatter¬ 
ing factor for electrons depends not only on 
the nuclear charge but also on the size of the 
surrounding electron cloud that screens the 
nuclear charge. As a result, electron scatter¬ 
ing factors in the resolution range of interest 
in macromolecular structure determination 
(up to 113 A -1 ), are sensitive to the effective 
radius of the outer valency electrons and 
therefore depend sensitively on the chemis¬ 
try of bonding. Although this is a fascinating 
field in itself, with interesting work already 
carried out by the gas phase electron diffrac¬ 
tion community [e.g., Hargittai and Hargit- 
tai (19)], it is still an area where much work 
remains to be done. At present, it is probably 
adequate to think of the density obtained in 
macromolecular structure analysis by elec¬ 
tron microscopy as roughly equivalent to the 
electron density obtained by X-ray diffrac¬ 
tion but with the contribution from hydro¬ 
gen atoms being somewhat greater relative 
to carbon, nitrogen, and oxygen. 

Those electrons that are inelastically scat¬ 
tered lose energy to the specimen by a number 
of mechanisms. The energy loss spectrum for a 
typical biological specimen is dominated by 
the large cross section for plasmon scattering 
in the energy range 20-30 eV f with a contin¬ 
uum in the distribution that decreases up to 
higher energies. At discrete high energies, spe¬ 
cific inner electrons in the K shell of carbon, 
nitrogen, or oxygen can be ejected with corre¬ 
sponding peaks in the energy loss spectrum 
appearing at 200-400 eV. Any of these inelas¬ 
tic interactions produces an uncertainty in the 
position of the scattered electron (by Heisen¬ 
berg's uncertainty principle) and, as a result, 
the resolution of any information present in 
the energy loss electron signal extends only to 
low resolutions of around 15 A (20). Conse¬ 
quently, the inelastically scattered electrons 
are generally considered to contribute little 
except noise to the images. 


4 RADIATION DAMAGE 

The most important consequence of inelastic 
scattering is the deposition of energy into the 
specimen. This is initially transferred to sec¬ 
ondary electrons, which have an average en¬ 
ergy (20 eV) that is 5 or 10 times greater than 
the valency bond energies. These secondary 
electrons interact with other components of 
the specimen and produce numerous reactive 
chemical species, including free radicals. In 
ice-embedded samples, these would be pre¬ 
dominantly highly reactive, hydroxyl free rad¬ 
icals that arise from the frozen water mole¬ 
cules. In turn, these react with the embedded 
macromolecules and create a great variety of 
radiation products such as modified side 
chains, cleaved polypeptide backbones, and a 
host of molecular fragments. From radiation 
chemistry studies, it is known that thiol or 
disulfide groups react more quickly than ali¬ 
phatic groups and that aromatic groups, in¬ 
cluding nucleic acid bases, are the most resis¬ 
tant. Nevertheless, the end effect of the 
inelastic scattering is the degradation of the 
specimen to produce a cascade of heteroge¬ 
neous products, some of which resemble the 
starting structure more closely than others. 
Some of the secondary electrons also escape 
from the surface of the specimen, causing it to 
charge up during the exposure. As a rough 
rule for 100-kV electrons, the dose that can be 
used to produce an image in which the starting 
structure at high resolution is still recogniz¬ 
able is about 1 e - /A 2 for organic or biological 
materials at room temperature, 5 e _ /A 2 for a 
specimen near liquid nitrogen temperature 
(-170°C), and 10 e“/A 2 for a specimen near 
liquid helium temperature (4-8 K). However, 
individual experimenters will often exceed 
these doses if they wish to enhance the low 
resolution information in the images that is 
less sensitive to radiation damage. The effects 
of radiation damage attributed to electron ir¬ 
radiation are essentially identical to those 
from X-ray or neutron irradiation for biologi¬ 
cal macromolecules except for the amount of 
energy deposition per useful coherent elasti¬ 
cally scattered event (21). For electrons scat¬ 
tered by biological structures at all electron 
energies of interest, the number of inelastic 
events exceeds the number of elastic events by 
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a factor of 3 to 4, so that 60 to 80 eV of energy 
is deposited for each elastically scattered elec¬ 
tron. This limits the amount of information in 
an image of a single biological macromolecule. 
Consequently, the 3D atomic structure cannot 
be determined from a single molecule but re¬ 
quires the averaging of the information from 
at least 10,000 molecules in theory, and even 
more in practice (21). Crystals used for X-ray 
or neutron diffraction contain many orders of 
magnitude more molecules. 

It is possible to collect both the elastically 
and the inelastically scattered electrons simul¬ 
taneously with an energy analyzer and, if a 
fine electron beam is scanned over the speci¬ 
men, then a scanning transmission electron 
micrograph displaying different properties of 
the specimen can be obtained. Alternatively, 
conventional transmission electron micro¬ 
scopes to which an energy filter has been 
added can be used to select out a certain en¬ 
ergy band of the electrons from the image. 
Both types of microscope can contribute in 
other ways to the knowledge of structure, but 
in this presentation, we concentrate on high 
voltage, phase-contrast electron microscopy of 
unstained macromolecules most often embed¬ 
ded in ice because this is the method of widest 
impact in structural biology. 

5 REQUIRED PROPERTIES OF 
ILLUMINATING ELECTRON BEAM 

The important properties of the image in 
terms of defocus, astigmatism, and the pres¬ 
ence and effect of amplitude or phase contrast 
are discussed later. The best quality incident 
electron beam is produced by a field emission 
gun (FEG). This is because the electrons from 
a FEG are emitted from a very small volume at 
the tip, which is the apparent source size. 
Once these electrons have been collected by 
the condenser lens and used to produce the 
illuminating beam, that beam of electrons is 
then nearly parallel (divergence of ~ 10~ 2 
mrad) and therefore spatially coherent. Simi¬ 
larly, because the emitting tip of a FEG is not 
heated as much as a conventional thermionic 
tungsten source, the thermal energy spread of 
the electrons is relatively small (0.5-1.0 eV) 
and, as a result, the illuminating beam is 


Pu rifie d specimen 



t 

Micrographs 



Stmcture-function relationships 


Figure 14.2. Flow diagram showing all the proce¬ 
dures involved in electron cryomicroscopy from 
sample preparation to map interpretation. 

closer to being monochromatic. Electron 
beams can also be produced by a normal, 
heated tungsten source, which gives a less par¬ 
allel beam with a larger energy spread, but is 
nevertheless adequate for electron cryomi¬ 
croscopy if the highest resolution images are 
not required. 

6 THREE-DIMENSIONAL ELECTRON 
CRYOMICROSCOPY OF 
MACROMOLECULES 

The determination of 3D structure by 
cryo-EM methods follows a common scheme 
for all macromolecules (Fig. 14.2). A more de¬ 
tailed discussion of the individual steps as ap¬ 
plied to different classes of macromolecules 
appears in subsequent sections. Briefly, each 
specimen must be prepared in a relatively ho¬ 
mogeneous, aqueous form (ID or 2D crystals 
or a suspension of single particles in a limited 
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number of states) at relatively high concentra¬ 
tion, rapidly frozen (vitrified) as a thin film, 
transferred into the electron microscope, and 
photographed by means of low dose selection 
and focusing procedures. The resulting im¬ 
ages, if recorded on film, must then be digi¬ 
tized. Digitized images are then processed by 
the use of computer programs that allow dif¬ 
ferent views of the specimen to be combined 
into a 3D reconstruction that can be inter¬ 
preted in terms of other available structural, 
biochemical, and molecular data. 


7 OVERVIEW OF CONCEPTUAL STEPS 

Radiation damage by the illuminating elec¬ 
tron beam generally allows only one good pic¬ 
ture (micrograph) to be obtained from each 
molecule or macromolecularassembly. In this 
micrograph, the signal-to-noise ratio of the 2D 
projection image is normally too small to accu¬ 
rately determine the projected structure. This 
implies, first, that it is necessary to average 
many images of different molecules taken 
from essentially the same viewpoint to in¬ 
crease the signal-to-noise ratio and, second, 
that many of these averaged projections, 
taken from different directions, must be com¬ 
bined to build up the information necessary to 
determine the 3D structure of the molecule. 
Thus, the two key concepts are: (1) averaging 
to a greater or lesser extent depending on res¬ 
olution, particle size and symmetry to increase 
the signal-to-noise ratio; and (2 )the combina¬ 
tion of different projections to build a 3D map 
of the structure. 

In addition, there are various technical cor¬ 
rections that must be made to the image data 
to allow an unbiased model of the structure to 
be obtained. These include correction for the 
phase-contrast transfer function (CTF) and, 
at high resolution, for the effects of beam tilt. 
For crystals, it is also possible to combine elec¬ 
tron diffraction amplitudes with image phases 
to produce a more accurate structure (7), and 
in general to correct for loss of high resolution 
contrast for any reason by "sharpening" the 
data by application of a negative temperature 
factor (22). 

The idea of increasing the signal-to-noise 
ratio in electron images of unstained biologi¬ 
cal macromolecules by averaging was dis¬ 


cussed in 1971 (23) and demonstrated in 1975 
(7, 24), although earlier work on stained spec¬ 
imens had shown the value of averaging to 
increase the signal-to-noise ratio. The im¬ 
provement obtained, as in all repeated mea¬ 
surements, gives a factor of VN improvement 
in signal-to-noise ratio, where Nis the number 
of times the measurement is made. The effect 
of averaging to produce an improvement in 
signal-to-noise ratio is seen most clearly in the 
processing of images from 2D crystals. Figure 
14.3 shows the results of applying a sequence 
of corrections, beginning with averaging, to 
two-dimensional crystals of bacteriorhodopsin 
in 2D space group p3. The panels show: (a, b) 
2D averaging, (c) correction for the micro¬ 
scope contrast transfer function (CTF), and 
(d) threefold crystallographic symmetry av¬ 
eraging of the phases and combination with 
electron diffraction amplitudes. At each 
stage in the procedure the projected picture 
of the molecules gets clearer. The final stage 
results in a virtually noise-free projected 
structure for the molecule at near atomic 
(3A) resolution. 

The earliest successful application of the 
idea of combining projections to reconstruct 
the 3D structure of a biological assembly was 
made by DeRosier and Klug (4). The idea-is 
that each 2D projection corresponds after Fou¬ 
rier transformation to a central section of the 
3D transform of the assembly. If enough inde¬ 
pendent projections are obtained, then the 3D 
transform will have been fully sampled and 
the structure can then be obtained by back 
transformation of the averaged, interpolated, 
and smoothed 3D transform. This procedure 
is shown schematically for a three-dimen¬ 
sional object in the shape of a duck, which rep¬ 
resents the molecule whose structure is being 
determined (Fig. 14.4). 

In practice, the implementation of these 
concepts has been carried out in a variety of 
ways, given that the experimental strategy 
and type of computer analysis used depend 
on the type of specimen, especially the molec¬ 
ular weight of the individual molecule, its 
symmetry, and whether it assembles into an 
aggregate with one-dimensional (ID), two-di¬ 
mensional (2D), or three-dimensional (3D) pe¬ 
riodic order. 
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Figure 14.3. Display of the results at different 
stages of image processing of a digitized micrograph 
cf a 2D crystal of bacteriorhodopsin. The left panel 
[a) shows an area of the raw digitized micrograph in 
which only electron noise is visible. The lower right 
Danel (b) shows the results of the averaging of unit 
:ells from the whole picture by unbending in real 
space and filtering in reciprocal space. The scale of 
;he density in (b) is the same as that in the original 
nicrograph, showing that the signal is very much 
weaker than the noise. Panel (c) shows the same 
lensity as that in (b) but with contrast increased 
LOfoldto show that the signal in the original picture 
s approximately 10 X below the noise level. Panel 
d) shows the density after correction for contrast 
ransfer function (CTF) attributed in this case to a 
lefocus of 6000 A. Panel (e) shows the density after 
urther threefold crystallographic averaging (the 
pace group is p3) and replacement of image ampli- 
udes by electron diffraction amplitudes. Panel (e) 
herefore shows an almost perfect atomic resolution 
mage of the projected structure of bacteriorhodop- 
in. The trimeric rings of molecules are centered on 
he crystallographic threefold axis and the internal 
tructure shows a-helical segments in the protein. 
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Figure 14.4. Schematic diagram to illustrate the 
principle of 3D reconstruction. Each 2D projected 
image, as recorded on the micrograph and after CTF 
correction, represents a section through the 3D Fou¬ 
rier transform. This is called the projection theo¬ 
rem. After accumulation of enough information 
from enough different views, a 3D map of the struc¬ 
ture can be calculated by Fourier inversion. 


8 CLASSIFICATION OF 
MACROMOLECULES 

The symmetry of a macromolecule or su- 
pramolecular complex is the primary determi¬ 
nant of how specimen preparation, micros¬ 
copy, and 3D image reconstruction are 
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performed. The classification of molecules ac¬ 
cording to their level of periodic order and 
symmetry (Table 14.1) provides a logical and 
convenient way to consider the means by 
which specimens are studied in 3D by micros¬ 
copy. 

Each type of specimen offers a unique set of 
challenges in obtaining 3D structural infor¬ 
mation at the highest possible resolution. The 
best resolutions achieved by 3D EM methods 
to date, at about 3-4 A, have been obtained 
with several thin, 2D crystals, in large part 
because of their excellent order. 

With the exception of true 3D crystals, 
which must be sectioned to make them thin 
enough to study by transmission electron mi¬ 
croscopy, the resolutions obtained with biolog¬ 
ical specimens are generally dictated by the 
preservation of periodic order, and the sym¬ 
metry and complexity of the object. Hence, 
studies of the helical acetylcholine receptor 
tubes (36), the icosahedral hepatitis B virus 
capsid (44), the SOS ribosome (45), and the 
centriole (26) have yielded 3D density maps at 
resolutions of 4.6, 7.4, 15, and 280 A, respec¬ 
tively. 

If high resolution were the sole objective of 
EM, it would be necessary, given the capabili¬ 
ties of existing technology, to try to form well- 
ordered 2D crystals or helical assemblies of 
each macromolecule of interest. Indeed, a 
number of different crystallization techniques 
have been devised [e.g., Horne and Pasquali- 
Ronchetti (46); Yoshimura et al. (47); Korn- 
bergand Darst (48);Jap et al. (49);Kubaleket 
al. (50);Rigaud et al. (51);Hasler et al. (52); 
Reviakine et al. (53); Wilson-Kubalek et al. 
(54)], and some of these have yielded new 
structural information about otherwise recal¬ 
citrant molecules like RNA polymerase (55). 
However, despite the obvious technological 
advantages of having a molecule present in a 
highly ordered form, most macromolecules 
function not as highly ordered crystals or he¬ 
lices but instead as single particles (e.g., many 
enzymes) or, more likely, in concert with other 
macromolecules as occurs in supramolecular 
assemblies. Also, crystallization tends to con¬ 
strain the number of conformational states a 
molecule can adapt and the crystal conforma¬ 
tion might not be functionally relevant. 
Hence, although resolution may be restricted 


to much below that realized in the bulk of cur¬ 
rent X-ray crystallographic studies, cryo-EM 
methods provide a powerful means to study 
molecules that resist crystallization in ID, 2D, 
or 3D. These methods allow one to explore the 
dynamic events, different conformational 
states (asinduced, for example, by altering the 
microenvironment of the specimen), and mac- 
romolecular interactions that are the key to 
understanding how each macromolecule func¬ 
tions. 


9 SPECIMEN PREPARATION 

The goal in preparing specimens for cryomi¬ 
croscopy is to keep the biological sample as 
close as possible to its native state to preserve 
the structure to atomic or near-atomic resolu¬ 
tion in the microscope and during microscopy. 
The methods by which numerous types of 
macromolecules and macromolecular com¬ 
plexes have been prepared for cryo-EM studies 
are now well established (9,56,57). Most such 
methods involve cooling samples at a rate fast 
enough to permit vitrification (solid, glasslike 
state) rather than crystallization of the bulk 
water. Noncrystalline biological macromole¬ 
cules are typically vitrified by applying a small 
(often <10 |llL) aliquot of a purified, approxi¬ 
mately 0.2-5 mg/mL suspension of sample to 
an EM grid coated with a carbon or holey car¬ 
bon support film. The grid, secured with a pair 
of forceps and suspended over a container of 
ethane or propane cryogen slush (maintained 
near its freezing point by a reservoir of liquid 
nitrogen), is blotted nearly dry with a piece of 
filter paper. The grid is then plunged into the 
cryogen, and the sample, if thin enough (—0.2 
jam or less), is vitrified in millisecond or 
shorter time periods (58-60). 

The ability to freeze samples with a time 
resolution of milliseconds affords cryo-EM one 
of its unique and, as yet, perhaps most under¬ 
utilized advantages: capturing and visualizing 
dynamic structural events that occur over 
time periods of a few milliseconds or longer. 
Several devices that allow samples to be per¬ 
turbed in a variety of ways as they are plunged 
into cryogen have been described [e.g., Subra- 
maniam et al. (61);Berriman and Unwin (59); 
Siegel et al. (62); Trachtenberg (63); White et 
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Table 14.1 Classification of Macromolecules According to Periodic Order and Symmetry 


Periodic 

Order 

Type Symmetry 


Example Macromolecule/Complex 

Representative Reference 

CD 

Point group 

Ci 

Ribosome 

25 



Cl 

Centriole 

26 



C 6 

Bacteriophage<J>29 head 

27 



C 8 

Ribonucleoprotein vault 

28 



C 17 

TMV disk 

29 



d 2 

p-galactosidase 

30 



d 8 

Clathrin coats 

31 



d 6 

Lumbricus terrestris hemoglobin 

32 



T 

Dps protein 

33 



O 

Azotobacter pyruvate dehydrogenase core 

34 



I 

Icosahedral viruses 

17 

ID 

Screw axis (helical)" 


Acto-myosin filament 

35 




Acetylcholine receptor tubes 

36 




Microtubule 

37 




Bacterial flagella 

38 




Tobacco mosaic virus 

39 

2D 

2D space group (2D crystal) 

p3 

Bacterial rhodopsin membrane 

11 



p42 x 2 

Aquaporin membrane 

40 



p 6 

Gap junction membrane 

41 



p321 

Light harvesting complex II 

12 



pl 2 i 

Tubulin sheet 

13 

3D 

3D space group (3D crystal) 

P2 1 2 1 2 1 

Myosin SI protein crystal 

42 



P 6 5 or P 6 4 

Insect flight muscle 

43 


"The symmetry of a helical structure is defined by an n m space axis, which combines a rotation of 2 ir/n radius about an axis followed by a translation of nvh of the repeat distance. 
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al. (60)]. Examples of the use of such devices 
include spraying acetylcholine onto its recep¬ 
tor to cause the receptor channel to open (64), 
or lowering the pH of an enveloped virus sam¬ 
ple to initiate early events of viral fusion (65), 
or inducing a temperature jump with a flash- 
tube system to study phase transitions in lipo¬ 
somes (66), or mixing myosin SI fragments 
with F-actin to examine the geometry of the 
crossbridge powerstroke in muscle (67). 

Crystalline (2D) samples fortunately can 
often be prepared for cryo-EM by means of 
simpler procedures, and vitrification of the 
bulk water is not always essential to achieve 
success (68). Such specimens may be applied 
to the carbon film on an EM grid by normal 
adhesion methods, washed with 1-2% solu¬ 
tions of solutes like glucose, trehalose, or tan¬ 
nic acid, blotted gently with filter paper to re¬ 
move excess solution, air dried, loaded into a 
cold holder, inserted into the microscope, and, 
finally, cooled to liquid nitrogen temperature. 

10 MICROSCOPY 

Once the vitrified specimen is inserted into the 
microscope and sufficient time is allowed 
(—15 min) for the specimen stage to stabilize 
to minimize drift and vibration, microscopy is 
performed to generate a set of images that, 
with suitable processing procedures, can later 
be used to produce a reliable 3D reconstruc¬ 
tion of the specimen at the highest possible 
resolution. To achieve this goal, imaging must 
be performed at an electron dose that mini¬ 
mizes beam-induced radiation damage to the 
specimen, with the objective lens of the micro¬ 
scope defocused to enhance phase contrast 
from the weakly scattering, unstained biolog¬ 
ical specimen, and under conditions that keep 
the specimen below the devitrification tem¬ 
perature and minimize its contamination. 

The microscopist locates specimen areas 
suitable for photography by searching the EM 
grid at very low magnification (<3000X) to 
keep the irradiation level very low (<0.05 e - / 
A 2 ) while assessing sample quality. In micro¬ 
scopes operated at 200 keV or higher, where 
image contrast is very weak, it is helpful to 
perform the search procedure with the assis¬ 
tance of a CCD camera or a video-rate TV- 


intensified camera system. For some speci¬ 
mens, like thin 2D crystals, searching is 
conveniently performed by viewing the low 
magnification, high contrast image produced 
by slightly defocusing the electron diffraction 
pattern by use of the diffraction lens. 

After a desired specimen area is identified, 
the microscope is switched to high magnifica¬ 
tion mode for focusing and astigmatism cor¬ 
rection. These adjustments are typically per¬ 
formed in a region about 2-10 jam away from 
the chosen area at the same or higher magni¬ 
fication than that used for photography. The 
choice of magnification, defocus level, acceler¬ 
ating voltage, beam coherence, electron dose, 
and other operating conditions is dictated by 
several factors. The most significant ones are 
the size of the particle or crystal unit cell being 
studied, the anticipated resolution of the im¬ 
ages, and the requirements of the image pro¬ 
cessing needed to compute a 3D reconstruc¬ 
tion to the desired resolution. For most 
specimens at required resolutions from 3 to 30 
A, images are typically recorded at 25,000- 
50,000X magnification, with an electron dose 
of between 5 and 20 e - /A 2 . These conditions 
yield micrographs of sufficient optical density 
(OD 0.2-1.5) and image resolution for subse¬ 
quent image-processing steps. Most modern 
EMs provide some mode of low dose operation 
for imaging beam-sensitive, vitrified biological 
specimens. 

The intrinsic low contrast of unstained 
specimens makes it impossible to observe and 
focus on specimen details directly, as is rou¬ 
tine with stained or metal-shadowed speci¬ 
mens. Focusing, aimed to enhance phase con¬ 
trast in the recorded images but minimize 
beam damage to the desired area, is achieved 
by judicious defocusing on a region that is ad¬ 
jacent to the region to be photographed and 
preferably situated on the microscope tilt axis. 
The appropriate focus level is set by adjusting 
the appearance of either the Fresnel fringes 
that occur at the edges of holes in the carbon 
film or the "phase granularity" from the car¬ 
bon support film. 

Unfortunately, electron images do not give 
a direct rendering of the specimen density dis¬ 
tribution. The relationship between image 
and specimen is described by the contrast 
transfer function (CTF), which is characteris- 
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tic of the particular microscope used, the spec¬ 
imen, and the conditions of imaging. The mi¬ 
croscope CTF arises from the objective lens 
focal setting and from the spherical aberration 
present in all electromagnetic lenses and var¬ 
ies with the defocus and accelerating voltage 
according to a formula (see below) that in¬ 
cludes both phase- and amplitude-contrast 
components. First, however, it might be useful 
to describe briefly the essentials of amplitude 
contrast and phase contrast, two concepts car¬ 
ried over from optical microscopy. Amplitude 
contrast refers to the nature of the contrast in 
an image of an object that absorbs the incident 
illumination or scatters it in any other way, so 
that a proportion of it is lost. As a result, the 
image appears darker where greater absorp¬ 
tion occurs. Phase contrast is required if an 
object is transparent (i.e., it is a pure phase 
object) and does not absorb but only scatters 
the incident illumination. Biological speci¬ 
mens for cryo-EM are almost pure phase ob¬ 
jects and the scattering is relatively weak, so 
that the simple theory of image formation by a 
weak phase object applies (69,70). An exactly 
in-focus image of a phase object has no con¬ 
trast variation because all the scattered illu¬ 
mination is focused back to equivalent points 
in the image of the object from which it was 
scattered. In optical microscopy, the use of a 
quarter wave plate can retard the phase of the 
direct unscattered beam, so that an in-focus 
image of a phase object has very high “Zer- 
nicke” phase contrast. However, there is as 
yet no simple quarter wave plate for electrons, 
so instead, phase contrast is created by intro¬ 
ducing phase shifts into the diffracted beams 
by adjustment of the excitation of the objective 
lens so that the image is slightly defocused. In 
addition, because all matter is composed of at¬ 
oms and the electric potential inside each 
atom is very high near the nucleus, even the 
electron scattering behavior of the light atoms 
found in biological molecules deviates from 
that of a weak phase object; however, for a 
deeper discussion of this the reader should re¬ 
fer to Reimer (70) or Spence (69). In practice, 
the proportion of "amplitude" contrast is 
about 7% at 100 kV, 5% at 200 kV, and 4% at 
300 kV for low dose images of protein mole¬ 
cules embedded in ice. 


The overall dependency of CTF on resolu¬ 
tion, wavelength, defocus, and spherical aber¬ 
ration is given by 

CTF(v) = -{(1 - FLp)‘ /2 sin[*M] 

^amp costly)]} 

where x(^) = ttAj^(A/ - 0.5C s A 2 v 2 ); v is the 
spatial frequency (in A - *); F amD is the fraction 
of amplitude contrast; A is the electron wave¬ 
length (in&, where 

A = 12.3/>/V + 0 000000978 • V 2 

(=0.037, 0.025, and 0.020 A for 100, 200, and 
300 keV electrons, respectively); Vis the volt¬ 
age (in volts); Af is the underfocus (in & ] and 
C s is the spherical aberration of the objective 
lens of the microscope (in A). 

In addition, this CTF is attenuated by an 
envelope or damping function, which depends 
on the coherence of the beam, specimen drift, 
and other factors (6,71,72). Figure 14.5 shows 
a few representative CTFs for different 
amounts of defocus on a normal and a FEG 
microscope. Thus, for a particular defocus set¬ 
ting of the objective lens, phase contrast in the 
electron image is positive and maximal only at ■ 
a few specific spatial frequencies. Contrast is 
either lower than maximal, completely absent, 
or it is opposite (inverted or reversed) from 
that at other frequencies. Hence, as the objec¬ 
tive lens is focused, the electron microscopist 
selectively accentuates image details of a par¬ 
ticular size. 

Images are typically recorded 0.8-3.0 /am 
underfocus to enhance specimen features in 
the 20-40 A size range and thereby facilitate 
phase origin and specimen orientation search 
procedures carried out in the image-process¬ 
ing steps. However, this level of underfocus 
also enhances the contrast envelope in lower 
resolution maps, which may help in interpre¬ 
tation. To obtain results at better than 10-15 
A resolution, it is essential to record, process, 
and combine data from several micrographs 
that span a range of defocus levels [e.g., Unwin 
and Henderson (7);Bottcher et al. (44)]. This 
strategy ensures good information transfer at 
all spatial frequencies up to the limiting reso- 
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Figure 14.5. Representative plots of the contrast transfer function (CTF) as a function of spatial 
frequency, for two different defocus settings (0.7 and 4.0 /xm underfocus) and for a field emission 
(light curve) or tungsten (dark curve) electron source. All plots correspond to electron images formed 
in an electron microscope operated at 200 kV and with objective lens aberration coefficients, C s = C c 
= 2.0 mm, and assuming amplitude contrast cf 4.8% (73). The spatial coherence, which is related to 
the electron source size and expressed as /3, the half-angle of illumination, for tungsten and FEG 
electron sources was fixed at 0.3 and 0.015 milliradians, respectively. Likewise, the temporal coher¬ 
ence (expressed as AS, the energy spread) was fixed at 1.6 and 0.5 eV for tungsten and FEG sources. 
The combined effects of the poorer spatial and temporal coherence cf the tungsten source leads to a 
significant dampening, and hence loss of contrast, cf the CTF at progressively higher resolutions 
compared to that observed in FEG-equipped microscopes. The greater number of contrast reversals 
with higher defocus arises because of the greater out-of-focus phase shifts. 


lution but requires careful compensation for 
the effects of the microscope CTF during im¬ 
age processing. Also, the recording of image 
focal pairs or focal series from a given speci¬ 
men area can be beneficial in determining or¬ 
igin and orientation parameters for processing 
of images of single particles [e.g., Cheng et al. 
(74);Trusetal. (75)]. 

Many high resolution cryo-EM studies are 
now performed with microscopes operated at 
200 keV or higher and with field emission gun 


(FEG) electron sources [e.g., Zemlin (76, 78); 
Zhou and Chiu (77);Mancini et al. (79)]. The 
high coherence of a FEG source ensures that 
phase contrast in the images remains strong 
out to high spatial frequencies (>1/3.5 A -1 ), 
even for highly defocused images. The use of 
higher voltages provides potentially higher 
resolution [greater depth of field (i.e., less cur¬ 
vature of the Ewald sphere) attributed to 
smaller electron beam wavelength], better 
beam penetration (less multiple scattering), 
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reduced problems with specimen charging 
that plague microscopy of unstained or un¬ 
coated vitrified specimens (80), and reduced 
phase shifts associated with beam tilt. 

Images are recorded on photographic film 
or on a CCD camera with either flood beam or 
spot-scan procedures. Film, with its advan¬ 
tages of low cost, large field of view, and high 
resolution (—10 jam), has remained the pri¬ 
mary image recording medium for most 
cryo-EM applications, despite disadvantages 
of high background fog and need for chemical 
development and digitization. CCD cameras 
provide image data directly in digital form and 
with very low background noise, but suffer 
from higher cost, limited field of view, limited 
spatial resolution caused by poor point spread 
characteristics, and a fixed pixel size (typically 
between 14 and 24 jam). They are useful, for 
example, for precise focusing and adjustment 
of astigmatism [e.g., Krivanek and Mooney 
(81);Sherman et al. (82)]. 

For studies in which specimens must be 
tilted to collect 3D data, such as with 2D crys¬ 
tals, or single particles that adopt preferred 
orientations on the EM grid, or specimens re¬ 
quiring tomography, microscopy is performed 
in essentially the same way as described 
above. However, the limited tilt range (±60- 
70°) of most microscope goniometers can lead 
to nonisotropic resolution in the 3D recon¬ 
structions (the "missing cone" problem), and 
tilting generates a constantly varying defocus 
across the field of view in a direction normal to 
the tilt axis. The effects caused by this varying 
defocus level must be corrected in high resolu¬ 
tion applications. 

11 SELECTION AND PREPROCESSING 
OF DIGITIZED IMAGES 

Before any image analysis or classification of 
the molecular images can be done, a certain 
amount of preliminary checking and normhl- 
ization is required to ensure there is a reason¬ 
able chance that a homogeneous population of 
molecular images has been obtained. First, 
good quality micrographs are selected in 
which the electron exposure is correct, there is 
no image drift or blurring, and there is mini¬ 
mal astigmatism and a reasonable amount of 
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defocus to produce good phase contrast. This 
is usually done by visual examination and op¬ 
tical diffraction. 

Once the best pictures have been chosen, 
the micrographs must be scanned and digi¬ 
tized on a suitable densitometer. The sizes of 
the steps between digitization of optical den¬ 
sity and the size of the sample aperture over 
which the optical density is averaged by the 
densitometer must be sufficiently small to 
sample the detail present in the image at fine 
enough intervals (83). Normally, a circular (or 
square) sample aperture of diameter (or 
length of side) equal to the step between digi¬ 
tizations is used. This avoids digitizing over¬ 
lapping points, without missing any of the in¬ 
formation recorded in the image. The size of 
the sample aperture and digitization step de¬ 
pends on the magnification selected and the 
resolution required. A value of 114 to 113 of the 
required limit of resolution (measured in jam 
on the emulsion) is normally ideal because it 
avoids having too many numbers (and there¬ 
fore wasting computer resources), without los¬ 
ing anything during the measurement proce¬ 
dure. For a 40,000 X image, on which a 
resolution of 10 A at the specimen is required, 
a step size of 10 jam {= i/4 X [(10Ax 40,000)/ 
(10,000 A/jam)]} would be suitable. 

The best area of an image of a helical or 2D 
crystal specimen can then be boxed off using a 
soft-edge mask. For images of single particles, 
a stack of individual particles can be created 
by selecting out many small areas surround¬ 
ing each particle. Because, in the later steps of 
image processing, the orientation and position 
of each particle are refined by comparing the 
amplitudes and phases of their Fourier com¬ 
ponents, it is important to remove spurious 
features around the edge of each particle and 
to make sure the different particle images are 
on the same scale. This is normally done by 
masking off a circular area centered on each 
particle and floating the density so that the 
average around the perimeter becomes zero 
(83). The edge of the mask is apodized by ap¬ 
plying a soft cosine bell shape to the original 
densities so they taper toward the background 
level. Finally, to compensate for variations in 
the exposure attributed to ice thickness or 
electron dose, most microscopists normalize 
the stack of individual particle images so that 
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the mean density and mean density variation 
over the field of view are set to the same values 
for all particles (84). 

Once some good particles or crystalline ar¬ 
eas for ID or 2D crystals have been selected, 
digitized, masked, and their intensity values 
normalized, true image processing can begin. 


12 IMAGE PROCESSING AND 3D 
RECONSTRUCTION 

Although the general concepts of signal aver¬ 
aging, together with combining different 
views to reconstruct the 3D structure, are 
common to the different computer-based pro¬ 
cedures that have been implemented, it is im¬ 
portant to emphasize one or two preliminary 
points. First, a homogeneous set of particles 
must be selected for inclusion in the 3D recon¬ 
struction. This selection may be made by eye, 
to eliminate obviously damaged particles or 
impurities, or by the use of multivariate sta¬ 
tistical analysis (85) or some other classifica¬ 
tion scheme. This allows a subset of the parti¬ 
cle images to be used to determine the 
structure of a better defined entity. All image- 
processing procedures require the determina¬ 
tion of the same parameters that are needed to 
specify unambiguously how to combine the in¬ 
formation from each micrograph or particle. 
These parameters are: the magnification, de¬ 
focus, astigmatism, and, at high resolution, 
the beam tilt for each micrograph; the electron 
wavelength used (i.e., accelerating voltage of 
the microscope);the spherical aberration coef¬ 
ficient ( C s ) of the objective lens; and the orien¬ 
tation and phase origin for each particle or 
unit cell of the ID, 2D, or 3D crystal. There are 

13 parameters for each particle, of which eight 
may be common to each micrograph and two 
or three (C s , kV, magnification) to each micro¬ 
scope. The different general approaches that 
have been used in practice to determine the 
3D structure of different classes of macromo- 
lecular assemblies from one or more electron 
micrographs are listed in Table 14.2. 

The precise way in which each general ap¬ 
proach codes and determines the particle or 
unit cell parameters varies greatly and is not 
describedin detail. Much of the computer soft¬ 
ware used in image reconstruction studies is 


relatively specialized compared to that used in 
the more mature field of macromolecular X- 
ray crystallography. In part, this may be at¬ 
tributed to the large diversity of specimen 
types amenable to cryo-EM and reconstruc¬ 
tion methods. As a consequence, image-recon¬ 
struction software is evolving quite rapidly, 
and references to software packages cited in 
Table 14.2 are likely to become quickly out¬ 
dated. Extensive discussion of algorithms and 
software packages in use at this time may be 
found in a number of recent special issues of 
the Journal of Structural Biology [volumes 
116(1), 120(3), 121(2), and 125(2/3)]. 

In practice, attempts to determine or refine 
some parameters may be affected by the in¬ 
ability to determine accurately one of the 
other parameters. The solution of the struc¬ 
ture is therefore an iterative procedure in 
which reliable knowledge of the parameters 
that describe each image is gradually built up 
to produce an increasingly accurate structure, 
until no more information can be squeezed out 
of the micrographs. At this point, if any of the 
origins or orientations is wrongly assigned, 
there will be a loss of detail and signal-to-noise 
ratio in the map. If a better determined or 
higher resolution structure is required, it 
would then be necessary to record images on a 
better microscope or to prepare new speci¬ 
mens and record better pictures. 

The reliability and resolution of the final 
reconstruction can be measured by use of a 
variety of indices. For example, the differen¬ 
tial phase residual (DPR) (133), the Fourier 
shell correlation (FSC) (134), and the Q-factor 
(135) are three such measures. DPR is the 
mean phase difference, as a function of resolu¬ 
tion, between the structure factors from two 
independent reconstructions, often calculated 
by splitting the image data into two halves. 
FSC is a similar calculation of the mean corre¬ 
lation coefficient between the complex struc¬ 
ture factors of the two halves of the data as a 
function of resolution. The Q-factor is the 
mean ratio of the vector sum of the individual 
structure factors from each image divided by 
the sum of their moduli, again calculated as a 
function of resolution. Perfectly accurate mea¬ 
surements would have values of DPR, FSC, 
and Q-factor of 0°, 1.0, and 1.0 respectively, 
whereas random data containing no informa- 
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Table 14.2 Methods of Three-Dimensional Image Reconstruction 


Type Structure 
(symmetry) 

Method 

Reference(s) to Technical/ 
Theoretical Details 

Asymmetric 

Random conical tilt 

18, 87, 88 

(Point group Cj) 

•Software package 

89 


Angular reconstitution 

32, 90 


^Software package 

91 


Weighted back projection 

92, 93 


Radon transform alignment 

94 


Reference-based alignment 

95 


Reference free alignment 

96, 97 


Fourier reconstruction and alignment 

98 


Tomographic tilt series and remote control cf 
microscope 3 

99-102 

Symmetric 

Angular reconstitution 

32, 90 

(Point groups 

* Software packages 

91, 103 

C„ D,; n > 1) 

Fourier-Bessel synthesis 

27 


Reference-based alignment and weighted 
back projection 

104 

Icosahedral 

Fourier-Bessel synthesis (common-lines) 

79,105-107 

(Point group I) 

*Reference-based alignment 

108-111 


* Software packages 

112-116 


Angular reconstitution 

Tomographic tilt series 

90, 117 

Helical 

Fourier-Bessel synthesis 

4, 36, 83, 119-123 


•Software packages and filament 
straightening routines 

112, 123-126 

2D Crystal 

Random azimuthal tilt 

11, 15, 24, 127, 128 


* Software packages 

112, 129 

3D Crystal 

Oblique section reconstruction 

43, 130 


* Software package 

131 


Sectioned 3D crystal 

42 


"Note: Electron tomography is the subject of an entire issue of J. Struct. Biol. [120, 207-395 (1997)] and a book edited by 
Frank (132). 


tion would have values of 90°, 0.0, and 0.0. The 
spectral signal-to-noise ratio (SSNR) crite¬ 
rion has been advocated as the best of all 
(136): it effectively measures, as a function of 
resolution, the overall signal-to-noise ratio 
(squared) of the whole of the image data. It is 
calculated by taking into consideration how 
well all the contributing image data agree in¬ 
ternally. 

An example of a typical strategy for deter¬ 
mination of the 3D structure of a new and un¬ 
known molecule without any symmetry and 
that does not crystallize might be as follows: 

1. Record a single axis tilt series of particles 
embedded in negative stain, with a tilt 
range from -60" to +60". 


2. Calculate 3D structures for each particle 
by use of an R-weighted back-projection 
algorithm (93). 

3. Average 3D data for several particles in 
real or reciprocal space to get a reasonably 
good 3D model of the stain excluding the 
region of the particle. 

4. Record a number of micrographs of the 
particles embedded in vitreous ice. 

5. Use the 3D negative stain model obtained 
in (3) with inverted contrast to determine 
the rough alignment parameters of the 
particle in the ice images. 

6. Calculate a preliminary 3D model of the 
average, ice-embedded structure. 
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7. Use the preliminary 3D model to deter¬ 
mine more accurate alignment parame¬ 
ters for the particles in the ice images. 

8. Calculate a better 3D model. 

9. Determine defocus and astigmatism to al¬ 
low CTF calculation and correct 3D model 
so that it represents the structure at high 
resolution. 

10. Keep adding pictures at different defocus 
levels to get an accurate structure at as 
high a resolution as possible. 

For large single particles with no symmetry 
or for particles with higher symmetry or for 
crystalline arrays, it should be possibleto miss 
out the negative staining steps and go straight 
to alignment of particle images from ice-em¬ 
bedding because the particle or crystal tilt an¬ 
gles can be determined internally from com¬ 
parison of phases along common lines in 
reciprocal space or from the lattice or helix 
parameters from a 2D or ID crystal. 

The following discussion briefly outlines 
for a few specific classes of macromolecule the 
general strategy for carrying out image pro¬ 
cessing and 3D reconstruction (see Fig. 14.6). 

12.1 2D Crystals 

For 2D crystals, the general 3D reconstruction 
approach consists of the following steps: First, 
a series of micrographs of single 2D crystals 
are recorded at different tilt angles, with ran¬ 
dom azimuthal orientations. Each crystal is 
then unbent using cross-correlation tech¬ 
niques, to identify the precise position of each 
unit cell (127), and amplitudes and phases of 
the Fourier components of the average of that 
particular view of the structure are obtained 
for the transform of the unbent crystal. The 
reference image used in the cross-correlation 
calculation can either be a part of the whole 
image masked off after a preliminary round of 
averaging by reciprocal space filtering of the 
regions surrounding the diffraction spots in 
the transform, or it can be a reference image 
calculated from a previously determined 3D 
model. The amplitudes and phases from each 
image are then corrected for the CTF and 
beam tilt (11,22, 127) and merged with data 
from many other crystals by scaling and origin 
refinement, taking into account the proper 


symmetry of the 2D space group of the crystal. 
Finally, the whole data set is fitted by least 
squares to constrained amplitudes and phases 
along the lattice lines (137) before calculating 
a map of the structure. The initial determina¬ 
tion of the 2D space group can be carried out 
by a statistical test of the phase relationships 
in one or two images of untilted specimens 
(138). The absolute hand of the structure is 
automatically correct, given that the 3D struc¬ 
ture is calculated from images whose tilt axis 
and tilt angle are known. Nevertheless, care 
must be taken not to make any of a number of 
trivial mistakes that would invert the hand. 

12.2 Helical Particles 

The basic steps involved in processing and 3D 
reconstruction of helical specimens include: 
Recording a series of micrographs of vitrified 
particles suspended over holes in a perforated 
carbon support film. The micrographs are dig¬ 
itized and Fourier-transformed to determine 
image quality (astigmatism, drift, defocus, 
presence, and quality of layer lines, etc.). Indi¬ 
vidual particle images are boxed, floated, and 
apodized within a rectangular mask. The pa¬ 
rameters of helical symmetry (number of sub¬ 
units per turn and pitch) must be determined 
by indexing the computed diffraction pat¬ 
terns. If necessary, simple spline-fitting proce¬ 
dures may be employed to "straighten" im¬ 
ages of curved particles (124), and the image 
data may be reinterpolated (126) to provide 
more precise sampling of the layer line data in 
the computed transform. Once a preliminary 
3D structure is available, a much more sophis¬ 
ticated refinement of all the helical parame¬ 
ters can be used to unbend the helices onto a 
predetermined average helix so that the con¬ 
tributions of all parts of the image are cor¬ 
rectly treated (123). The layer line data are 
extracted from each particle transform and 
two phase origin corrections are made, one to 
shift the phase origin to the helix axis (at the 
center of the particle image) and the other to 
correct for effects caused by having the helix 
axis tilted out of the plane normal to the elec¬ 
tron beam in the electron microscope. The 
layer line data are separated out into near- 
and far-side data, corresponding to contri¬ 
butions from the near and far sides of each 
particle imaged. The relative rotations and 
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Figure 14.6. Examples of macromolecules studied by cryo-EM and 3D image reconstruction and the 
resulting 3D structures (bottomrow) after cryo-EM analysis. All micrographs (top row) are displayed 
at about 170,000x magnification and all models at about 1,200, 000X magnification, (a) A single 
particle without symmetry: The micrograph shows 70SE. coli ribosomes complexed with mRNAand 
fMet-tRNA. The surface-shaded density map, made by averaging 73,000 ribosome images from 287 
micrographs has a resolution (FSC) of 11.5 A. The SOS and 30S subunits and the tRNA are colored 
blue, yellow, and green, respectively. The identity of many of the subunits is known and some RNA 
double helices are clearly recognizable by their major and minor grooves (e.g., helix 44 is shown in 
red). [Courtesy of J. Frank (SUNY, Albany), using data from Gabashvili et al. (86).] (b) A single 
particle with symmetry: The micrograph shows hepatitis B virus cores. The 3D reconstruction, at a 
resolution of 7.4 A (DPR), was computed from 6384 particle images taken from 34 micrographs. 
[From Bottcher et al. (44).] (c) A helical filament: The micrograph shows actin filaments decorated 
with myosin SI heads containing the essential light chain. The 3D reconstruction, at a resolution of 
30-35 A is a composite in which the differently colored parts are derived from a series of difference 
maps that were superimposed on f-actin. The components include: f-actin (blue), myosin heavy chain 
motor domain (orange), essential light chain (purple), regulatory light chain (white), tropomyosin 
(green), and myosin motor domain N-terminal beta-barrel (red). [Courtesy of A Lin, M. Whittaker, 
and R. Milligan (Scripps Research Institute, La Jolla, CA).] (d ! A 2D crystal, light-harvesting complex 
LHCII at 3.4-A resolution. The model shows the Drotein backbone and the arrangement of chro- 
mophores in a number of trimeric subunits in thecry stal lattice. In this example, image contrast is too 
low to see any hint of the structure without image processing (see also Fig. 14.3). See color insert. 
[Courtesy ofW. Kuhlbrandt (Max-Planck-Institute for Biophysics, Frankfurt, Germany).] 


translations needed to align the different 
transforms are determined so the data may be 
merged and a 3D reconstruction computed by 
Fourier-Bessel inversion procedures (83).De¬ 
termination of the absolute hand requires 
comparison of a pair of images recorded with a 
small tilt of the specimen between the views 
(139). 

12.3 Icosahedral Particles 

The typical strategy for processing and 3D re¬ 
construction of icosahedral particles consists 
ofthe following steps: First, a series of micro¬ 


graphs of a monodisperse distribution of par¬ 
ticles, normally suspended over holes in a per¬ 
forated carbon support film, is recorded. After 
digitization of the micrographs, individual 
particle images are boxed and floated with a 
circular mask. The astigmatism and defocus of 
each micrograph is measured from the sum of 
intensities of the Fourier transforms of all 
particle images (140). Autocorrelation tech¬ 
niques are then used to estimate the particle 
phase origins, which coincide with the center 
of each particle, where all rotational symme¬ 
try axes intersect (141). The view orientation 
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of each particle, defined by three Eulerian an¬ 
gles, is determined either by means of com¬ 
mon and cross-common lines techniques or 
with the aid of model-based procedures [e.g., 
Crowther (106); Fuller et al. (107);Bakeret al. 
(17)]. Once a set of self-consistent particle im¬ 
ages is available, an initial, low resolution 3D 
reconstruction is computed by merging these 
data with Fourier-Bessel methods (106). This 
reconstruction then serves as a reference for 
refining the orientation, origin, and CTF pa¬ 
rameters of each of the included particle im¬ 
ages, for rejecting "bad" images, and for in¬ 
creasing the size of the data set by including 
new particle images from additional micro¬ 
graphs taken at different defocus levels. A new 
reconstruction, computed from the latest set 
of images, serves as a new reference and the 
above refinement procedure is repeated until 
no further improvements, as measured by the 
reliability criteria mentioned above, are made. 
Determination of the absolute hand of the 
structure requires the recording and process¬ 
ing of a pair of images taken with a known, 
small relative tilt of the specimen between the 
two views (142). 


13 VISUALIZATION, MODELING, 

AND INTERPRETATION OF RESULTS 

Once a reliable 3D map is obtained, computer 
graphics and other visualization tools may be 
used as aids in interpreting morphological de¬ 
tails and understanding biological function in 
the context of biochemical and molecularstud- 
ies and complementary X-ray crystallographic 
and other biophysical measurements. 


14 TRENDS 

The new generation of intermediate voltage 
(—300 kV) FEG microscopes is now making it 
much easier to obtain higher resolution im¬ 
ages that, by use of larger defocus values, have 
good image contrast at both very low and very 
high resolution. The greater contrast at low 
resolution greatly facilitates particle-align¬ 
ment procedures, and the increased contrast 
resulting from the high coherence illumina¬ 


tion helps to increase the signal-to-noise ratio 
for the structure at high resolution. Cold 
stages are constantly being improved, with 
several liquid helium stages now in operation 
(143, 144). Two of these are commercially 
available from JEOF and FEI/Philips. 

Finally, three additional likely trends in¬ 
clude: (l)increased automation, including the 
recording of micrographs, the use of spotscan 
procedures in remote microscope operation 
(145, 146), and in every aspect of image pro¬ 
cessing; (2) production of better electronic 
cameras (e.g., CCD or pixel detectors); and (3) 
increased use of dose-fractionated, tomo¬ 
graphic tilt series, to extend EM studies to the 
domain of larger supramolecular and cellular 
structures (102,147). 
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16 ABBREVIATIONS 


CD 

zero-dimensional (single parti¬ 
cles) 

ID 

one-dimensional (helical) 

2D 

two-dimensional 

3D 

three-dimensional 

CCD 

charge coupled device (slow scan 
TV detector) 

cryo-EM 

electron cryomicroscopy 

CTF 

contrast transfer function 

EM 

electron microscope/microscopy 

FEG 

field emission gun 


REFERENCES 

1. S. Brenner and R. W. Horne, Biochem. Bio- 
phys. Acta — Prot. Struct., 34, 103-110 (1959). 

2. H. E. Huxley and G. Zubay, J. Mol. Biol., 2, 
10-18 (1960). 



References 


629 


3. A. Klug and J. E. Berger, J. Mol. Biol., 10, 

565-569 (1964). 

4. D. J. DeRosier and A. Klug, Nature, 217, ISO- 
134 (1968). 

5. W. Hoppe, R. Langer, G. Knesch, and C. Poppe, 
Naturwissenschaften, 55,333-336 (1968). 

6. H. P. Erickson and A. Klug, Philos. Trans. R. 
Soc. Lond. B, 261,105-118 (1971). 

7. P. N. T. Unwin and R. Henderson, J, Mol. 
Biol., 94,425440 (1975). 

8. J. Dubochet, J. Lepault, R. Freeman, J. A Ber- 
riman, and J.-C. Homo, J.Microsc., 128,219- 
237 (1982c). 

9. J. Dubochet, M. Adrian, J.-J. Chang, J.-C. 
Homo, J. Lepault, A W. McDowall, and P. 
Schultz, Q. Rev. Biophys., 21, 129-228 (1988). 

10. K. A. Taylor and R. M. Glaeser, Science, 186, 
1036-1037 (1974). 

11. R. Henderson, J. M. Baldwin, T. A. Ceska, F. 
Zemlin, E. Beckmann, and K. H. Downing, J. 
Mol. Biol., 213,899-929 (1990). 

12. W. Kuhlbrandt, D. N. Wang, and Y. Fujiyoshi, 
Nature, 367,614-621 (1994). 

13. E. Nogales, S. G. Wolf, and K. H. Downing, 
Nature, 391,199-203 (1998). 

14. K. Murata, K. Mitsuoka, T. Hirai, T. Waltz, P. 
Agre, J. B. Heymann, A. Engel, and Y. Fujiyo¬ 
shi, Nature, 407,599-605 (2000). 

15. L. A Amos, R. Henderson, and P. N. T. Unwin, 
Prog. Biophys. Mol. Biol., 39,183-231 (1982). 

16. T. Walz & N. Grigorieff, J. Struct. Biol., 121, 
142-161 (1998). 

17. T. S. Baker, N. H. Olson, and S. D. Fuller, Mi¬ 
crobiol. Mol. Biol. Rev., 63,862-922 (1999). 

18. J. Frank, Three-Dimensional Electron Micros¬ 
copy of MacromolecularAssemblies, Academic 
Press, San Diego, CA, 1996,342 pp. 

19. I. Hargittai and M. Hargittai, Eds., Stereo¬ 
chemical Applications cf Gas-Phase Electron 
Diffraction, VCH, New York, 1988. 

20. M. Isaacson, J. Langmore, and H. Rose, Optik, 

41, 92-96 (1974). 

21. R. Henderson, Q. Rev. Biophys., 28, 171-193 
(1995). 

22. W. A. Havelka, R. Henderson, and D. Oester- 
helt, J. Mol. Biol., 247, 726-738 (1995). 

23. R. M. Glaeser, J. Ultrastruct. Res., 36, 466- 
482 (1971). 

24. R. Henderson and P. N. T. Unwin, Nature, 

257, 28-32 (1975). 


25. J. Frank, Curr. Opin. Struct. Biol., 7,266-272 
(1997). 

26. J. Kenney, E. Karsenti, B. Gowen, and S. D. 
Fuller, J. Struct. Biol., 120,320-328 (1997). 

27. Y. Tao, N. H. Olson, W. Xu, D. L. Anderson, 
M. G. Rossmann, and T. S. Baker, Cell, 95, 

431-437 (1998). 

28. L. B. Kong, A. C. Siva, L. H. Rome, and P. L. 
Stewart, Structure, 7, 371-379 (1999). 

29. A. C. Bloomer, J. Graham, S. Hovmoller, 
P. J . G. Butler, and A Klug, Nature, 276, 362- 
368 (1978). 

30. R. H. Jacobson, X.-J. Zhang, R. F. DuBose, and 
B. W. Matthews, Nature, 369,761-766(1994). 

31. G. P. A Vigers, R. A. Crowther, and B. M. F. 
Pearse,EMBO J.,5,529-534 (1986). 

32. M. Schatz, E. V. Orlova, P. Dube, J. Jager, and 
M. van Heel, J . Struct. Biol., 114, 28-40 
(1995). 

33. R. A. Grant, D. J. Filman, S. E. Finkel, R. 
Kolter, and J. M. Hogle, Nat. Struct. Biol., 5, 

294-303 (1998). 

34. A. Mattevi, G. Obmolova, E. Schulze, K. H. 
Kalk, A H. Westphal, A. D. Kok, and W. G. J. 
Hoi, Science, 255,1544-1550 (1992). 

35. R. A. Milligan, Proc. Natl. Acad. Sci. USA, 93, 

21-26 (1996). 

36. A Miyazawa, Y. Fujiyoshi, M. Stowell, and N. 
Unwin, J. Mol. Biol., 288, 765-786 (1999). 

37. K. Hirose, W. B. Amos, A. Lockhart, R. A. 
Cross, and L. A Amos, J. Struct. Biol., 118, c 
140-148 (1997). 

38. K. Namba and F. Vonderviszt, Q. Rev. Bio¬ 
phys ., 30,1-65 (1997). 

39. T.-W. Jeng, R. A. Crowther, G. Stubbs, and W. 
Chui, J. Mol. Biol., 205,251-257 (1989). 

40. A Cheng, A. N. van Hoek, M. Yeager, A S. 
Verkman, and A K. Mitra, Nature, 387, 627- 
630 (1997). 

41. V. M. Unger, N. M. Kumar, N. B. Gilula, and 
M. Yeager, Science, 283,1176-1180 (1999). 

42. D. A. Winkelmann, T. S. Baker, and I. Ray- 
ment, J. Cell Biol., 114, 701-713 (1991). 

43. K. A Taylor, J. Tang, Y. Cheng, and H. Win¬ 
kler, J. Struct. Biol., 120,372-386 (1997). 

44. B. Bottcher, S. A. Wynne, and R. A Crowther, 
Nature, 386,88-91 (1997). 

45. A Malhotra, P. Penczek, R. K. Agrawal, I. S. 
Gabashvili, R. A. Grassucci, R. Junemann, N. 
Burkhardt, K. H. Nierhaus, and J. Frank, J. 
Mol. Biol., 280,103-116 (1998). 

46. R. W. Horne and I. Pasquali-Ronchetti, J. Ul- 
trastruct. Res., 47, 361-383 (1974). 



630 


Electron Cryomicroscopy of Biological Macromolecules 


47. H. Yoshimura, M. Matsumoto, S. Endo, and K. 
Nagayama, Ultramicroscopy, 32, 265-274 
(1990). 

48. R. Kornberg and S. A Darst, Curr. Opin. 
Struct. Biol., 1,642-646(1991). 

49. B. Jap, M. Zulauf, T. Scheybani, A. Hefti, W. 
Baumeister, andU. Aebi, Ultramicroscopy, 46, 
45-84 (19$). 

50. E. W. Kubalek, S. F. J. LeGrice, and P. 0. 
Brown, J. Struct. Biol., 113,117-123 (1994). 

51. J.-L. Rigaud, G. Mosser, J.-J. Lacapere, A. 
Olofsson, D. Levy, and J.-L. Ranck, J. Struct. 
Biol, 118, 226-235(1997). 

52. L.Hasler, J. B. Heymann, A Engel, J. Kistler, 

and T. Walz, J. Struct. Biol., 121, 162-171 
(1998). 

53.1. Reviakine, W. Bergsma-Schutter, and A 
Brisson, J. Struct. Biol., 121,356-361(1998). 

54. E.M. Wilson-Kubalek, R, E. Brown, H. Celia, 
and R. A. Milligan, Proc. Natl. Acad. Sci. USA, 

95, 8040-8045(1998). 

55. A Polyakov, C. Richter, A Malhotra, D. Kou- 
lich, S. Borukhov, and S. A. Darst, J .Mol. Biol, 

281,465-473(1998). 

56. M. Adrian, J. Dubochet, J. Lepault, and A W. 
McDowall, Nature, 308, 32-36 (1984). 

57. J.R. Bellare, H. T. Davis, L. E. Scriven, and Y. 

Talmon, J. Electron Microsc. Technol., 10, 87- 
111 (1988). 

58. E. Mayer and G. Asti, Ultramicroscopy, 45, 
185-197 (19$). 

59. J. Berriman and N. Unwin, Ultramicroscopy, 
56,241-252(1994). 

60. H. D. White, M. L. Walker, and J. Trinick, J. 
Struct. Biol., 121,306-313 (1998). 

61. S. Subramaniam, M. Gerstein, D. Oesterhelt, 
and R. Henderson, EMBO J., 12, 1-18 (1993). 

62. D. P .Siegel, W. J. Green, and Y. Talmon, Bio- 

phys. J., 66, 402-414(1994). 

63. S. Trachtenberg,,/. Struct. Biol., 123,45-55 
(19$). 

64. N.Unwin, Nature, 373, 37-43 (1995). 

65. S. D. Fuller, J. A, Berriman, S. J. Butcher, and 
B. E. Gowen, Cell, 81,715-725 (1995). 

66. D. P .Siegel and R. M. Epand, Biophys. J.,73, 

3089-3111(1997). 

67. M. Walker, X.-Z Zhang, W. Jiang, J. Trinick, 
and H. D. White, Proc. Natl. Acad. Sci. USA, 
96,4 65-4 7 0(1999). 

68. M. Cyrklaff and W. Kiihlbrandt, Ultramicros¬ 
copy, 55,141-153 (1994). 


69. J. C. H .Spence, Experimental High-Resolution 
Electron Microscopy, Oxford University Press, 
Oxford, UK, 1988. 

70. L. Reimer, Transmission Electron Microscopy, 
Springer-Verlag, Berlin, 1989. 

71. R. H. Wade and J. Frank, Optik, 49, 81-92 
(1977). 

72. R. H. Wade, Ultramicroscopy, 46, 145-156 

(1992). 

73. C. Toyoshima, K. Yonekura, and H. Sasabe, 
Ultramicroscopy, 48,165-176 (1993). 

74. R. H. Cheng, N. H. Olson, and T. S. Baker, 
Virology, 186, 655-668(19$). 

75. B. L. Trus, R. B. S. Roden, H. L. Greenstone, 
M. Vrhel, J. T. Schiller, and F. P. Booy, Nat. 
Struct. Biol., 4, 413-420(1997). 

76. F. Zemlin, Ultramicroscopy, 46, 25-32 (1992). 

77. Z. H. Zhou and W. Chiu, Ultramicroscopy, 49, 
407-416(1993). 

78. F. Zemlin, Micron, 25,223-226(1994). 

79. E. J. Mancini, F. D. Haas, and S. D. Fuller, 
Structure, 5,741-750 (1997). 

80. J. Brink, M. B. Sherman, J. Berriman, and W. 
Chiu, Ultramicroscopy, 72, 41-52 (19$). 

81. O. L. Krivanekand P. E. Mooney, Ultramicros¬ 
copy, 49, 95-108 (1993). 

82. M. B. Sherman, J. Brink, and W. Chiu, Micron, 

27,129-139(1996). 

83. D, J. DeRosier and P. B. Moore, J. Mol. Biol., 

52,355-369(1970). 

84. J. L. Carrascosa and A. C. Steven, Micron, 9, 

199-206 (1978). 

85. M. van Heel and J. Frank, Ultramicroscopy, 6, 
187-194(1981). 

86. I. S. Gabashvili, R. K. Agrawal, C. M. T. Spahn, 
R. A. Grassucci, D. I. Svergun, J. Frank, and P. 
Penczek, Cell, 100,537-549(2000). 

87. M. Radermacher, T. Wagenknecht, A Ver- 
schoor, and J. Frank, J. Microsc., 146,113-136 
(1987). 

88. M. Radermacher, J. Electron Microsc. Tech¬ 
nol., 9,359-394(1988). 

89. J. Frank, M. Radermacher, P. Penczek, J. Zhu, 
Y. Li, M. Ladjadj, and A. Leith, J. Struct. Biol, 

116,190-199(1996). 

90. M. van Heel, Ultramicroscopy, 21, 111-124 

(1987a). 

91. M. van Heel, G. Harauz, and E. V. Orlova, J. 
Struct. Biol., 116,17-24 (1996). 

92. M. Radermacher in D.-P. Hader, Ed., Image 
Analysis in Biology, CRC Press, Boca Raton, 
FL, 1991,pp. 219-246. 



References 


631 


93. M. Radermacher in J. Frank, Ed., Electron To¬ 
mography, Plenum Press, NewYork, 1992, pp. 
91-115. ' 

94. M. Radermacher, Ultramicroscopy, 53, 

121-136 (1994). 

95. P. A. Penczek, R. A. Grassucci, and J. Frank, 
Ultramicroscopy, 53,251-270(1994). 

96. M. Schatz and M. van Heel, Ultramicroscopy, 

32,255-264(1990). 

97. P. Penczek, M. Radermacher, and J. Frank, 
Ultramicroscopy, 40, 33-53 (1992). 

98. N. Grigorieff, J. Mol. Biol, 211, 1033-1046 
(1998). 

99. D. E. Olins, A. L. Olins, H. A. Levy, R. C. 
Durfee, S. M, Margie, E. P. Tinnel, and S. D. 
Dover, Science, 220,498—500(1983). 

100. U. Skoglund and B. Dane holt, Trends Bio- 
chem. Sci., 11,499503(1986). 

101. J. C. Fung, W. Liu, W. J. DeRuijter, H. Chen, 
C. K. Abbey, J. W. Sedat, and D. A. Agard, J. 
Struct. Biol., 116,181-189 (1996). 

102. W. Bciumeister, R. Grimm, and J. Walz, Trends 
Cell Biol., 9, 81-85 (1999). 

103. A. K. Shah and P. L. Stewart, J. Struct. Biol., 

123,17-21 (1998). 

104. F. Beuron, M. R. Maurizi, D. M. Belnap, E. 
Kocsis, F. P. Booy, M. Kessel, and A. C. Steven, 
J. Struct. Biol., 123,248-259(1998). 

105. R. A Crowther, L. A. Amos, J. T. Finch, D. J. 
DeRosier, and A Klug, Nature, 226, 421-425 
(1970). 

106. R. A. Crowther, Philos. Trans. R. Soc. Lond., 
261,221-230(1971). 

107. S. D. Fuller, S. J. Butcher, R. H. Cheng, and 
T. S. Baker, J. Struct. Biol., 116, 48-55 (1996). 

108. R. H. Cheng, V. S. Reddy, N. H. Olson, A J. 
Fisher, T. S. Baker, and J. E. Johnson, Struc¬ 
ture, 2,271-282(1994). 

109. R. A. Crowther, N. A. Kiselev, B. Bottcher, 
J. A. Berriman, G. P. Borisova, V. Ose, and P. 
Pumpens, Cell, 77,943-950(1994). 

110. T. S. Baker and R. H. Cheng, J. Struct. Biol., 

116,120-130(1996) . 

111. J. R. Caston, D. M. Belnap, A. C. Steven, and 
B. L. Trus, J. Struct. Biol., 125, 209-215 
(1999). 

112. R. A. Crowther, R. Henderson, and J. M. 
Smith, J. Struct. Biol., 116, 9—16 (1996). 

113. J. A. Lawton and B. V. V. Prasad, J. Struct. 

Biol., 116, 209-215 (1996). 

114. P. A. Thuman-Commike and W. Chiu, J. 
Struct. Biol., 116,41-47 (1996). 


115. I. M. Boier Martin, D. C. Marinescu, R. E. 
Lynch, and T. S. Baker, J. Struct. Biol., 120, 
146-157 (1997). 

116. Z. H. Zhou,W. Chiu, K. Haskell, H. J. Spears, 
J. Jakana, F. J. Rixon, and L. R. Scott, Biophys. 

J.,74,576-588 (1998). 

117. P.L. Stewart, C. Y. Chiu, S. Huang, T. Muir, Y. 
Zhao, B. Chait, P. Mathias, and G. R. Nem- 
erow, EMBO J., 16,1189-1198(1997). 

118. J. Walz, T. Tamura, N. Tamura, R. Grimm, W. 
Baumeister, and A. J. Foster, Mol. Cell, 1, 
59-65 (1997). 

119. M. Stewart, J. Electron Microsc. Technol, 9, 

325-358 (1988). 

120. C. Toyoshima and N. Unwin, J, Cell Biol., Ill, 
2623-2635 (1990). 

121. D. G. Morgan and D. DeRosier, Ultramicros¬ 
copy, 46,263-285(1992). 

122. N. Unwin, J. Mol. Biol., 229, 1101-1124 
(1993). 

123. R. Beroukhim and N. Unwin, Ultramicros¬ 
copy, 70,57-81 (1997). 

124. E.H .Egelman, Ultramicroscopy, 19, 367-374 
(1986). 

125. B. Carragher, M. Whittaker, and R. A. Milli¬ 
gan, J. Struct. Biol., 116,107—112 (1996). 

126. C. H. Owen, D. G. Morgan, and D. J. DeRosier, 
J. Struct. Biol., 116,167-175 (1996). 

127. R. Henderson, J. M. Baldwin, K. H. Downing, 
J. Lepault, and F. Zemlin, Ultramicroscopy, 

19,147-178 (1986). 

128. J. M. Baldwin, R. Henderson, E. Beckman, and 
F. Zemlin, J. Mol. Biol., 202, 585—591 (1988). 

129 . S. Hardt, B. Wang, andM. F. Schmid, J. Struct. 
Biol., 116, 68-70 (1996). 

130. R. A. Crowther and P. K. Luther, Nature, 307, 
569-570 (1984). 

131. H. Winkler and K. A. Taylor, J. Struct. Biol., 

116,241-247(1996). 

132. J. Frank in J. Frank, Ed., Electron Tomogra¬ 
phy: Three-Dimensional Imaging with the 
Transmission Electron Microscope, Plenum 
Press, New York, 1992,399 pp. 

133. J. Frank, A Verschoor, and M. Boublik, Sci¬ 
ence, 214,1353-1355(1981). 

134. M. van Heel, Ultramicroscopy, 21, 95—100 

(1987b). 

135. M. van Heel and J. Hollenberg inW. Baumeis- 
ter and W. Vogell, Eds., Electron Microscopy at 
Molecular Dimensions, Springer-Verlag, Ber¬ 
lin, 1980, pp. 256—260. 

136. M. Unser, B. L. Trus, J. Frank, and A. C. 
Steven, Ultramicroscopy, 30,429-434(1989). 



632 


Electron Cryomicroscopy of Biological Macromolecules 


137. D. A Agard, J. Mol. Biol., 167, 849-852 
(1983). 

138. J. M Valpuesta, J. L. Carrascosa, and R. Hen¬ 
derson, J. Mol. Biol., 240,281-287(1994). 

139. J. T. Finch, J. Mol. Biol., 66,291-294(1972). 

140. Z. H. Zhou, S. Hardt, B. Wang, M. B. Sherman, 
J. Jakarta, and W. Chiu, J. Struct. Biol., 116, 
216-222 (1996). 

141. N.H. Olson and T. S. Baker, Ultramicroscopy, 

30,281-298(1989). 

142. D.M Belnap, N. H. Olson, and T. S. Baker, J, 
Struct. Biol., 120,44-51 (1997). 


143. Y. Fujiyoshi, T. Mizusaki, K. Morikawa, H. 
Yamagishi, Y. Aoki, H. Kihara, and Y. Harada, 
Ultramicroscopy, 38,241-251 (1991). 

144. F. Zemlin, E. Beckmann, and K. D. vander- 
Mast, Ultramicroscopy, 63,227-238 (1996) . 

145. N. Kisseberth, M. Whittaker, D. Weber, C. S. 
Potter, and B. Carragher, J. Struct. Biol., 120, 
309-319 (1997). 

146. M. Hadida-Hassan, S. J. Young, S. T. Peltier, 
M Wong, S. Lamont, and M H. Ellisman, J. 
Struct. Biol., 125,235-245 (1999). 

147. B. F. McEwen, K. H. Downing, and R. M Glae- 
ser, Ultramicroscopy , 60,357-373 (1995). 



CHAPTER FIFTEEN 


Peptidomimetics for Drug 
Design 


M. Angels Estiarte 
Daniel H. Rich 

School cf Pharmacy—Department cf Chemistry 
University of Wisconsin-Madison 
Madison, Wisconsin 

Contents 

1 Introduction, 634 

2 Classification of Peptidomimetics, 634 

3 Design of Conformationally Restricted Peptides, 

636 

4 Template Mimetics, 643 

5 Peptide Bond Isosteres, 644 

6 From Transition-State Analog Inhibitors to Non- 
Peptide Inhibitors: Examples in Protease 
Inhibitors, 646 

6.1 TSA in Aspartic Peptidase Inhibitors, 647 

6.2 TSA in Metallo Peptidase Inhibitors, 650 

6.3 TSA-Derived Cysteine and Serine Peptidase 
Inhibitors, 652 

7 Speeding up Peptidomimetic Research, 655 

8 Toward Rational Drug Design: Discovery cf 
Novel Non-Peptide Peptidomimetics, 657 

9 Historical Development of Important Non- 
Peptide Peptidomimetics, 659 

9.1 HIV Protease, 659 

9.2 Thrombin, 660 

9.3 Factor Xa, 662 

9.4 GlycoproteinIlb/IIIa (GP Ilb/IIIa), 662 

9.5 Ras-Farnesyltransferase, 665 

9.6 Non-PeptidicFigands for Peptide Receptors, 

667 

9.6.1 Angiotensin 11, 668 
9.6.2Substance P, 669 
9.6.3Neuropeptide Y, 670 
9.6.4Growth Hormone Secretagogues, 670 
9.6.5 Endothelin, 672 

„ 10 Summary and Future Directions, 674 

Burger s Medicinal Chemistry and Drug Discovery 

Sixth Edition, Volume 1: Drug Discovery 

Edited by Donald J. Abraham 

ISBN 0-471-27090-3 © 2003 John Wiley & Sons, Inc. 


633 



634 


Peptidomimetics for Drug Design 


1 INTRODUCTION 

Protein-protein interactions are central to bi¬ 
ology and provide one mechanism to convert 
genomic information into regulated biological 
responses. Important examples of protein- 
peptide interactions include the binding of 
peptide ligands to proteases, the binding of 
peptide hormones to peptide receptors, the re¬ 
cruitment of proteins to effect signal trans¬ 
duction, and apoptosis. Peptides also act as 
neurotransmitters, neuromodulators, hor¬ 
mones, and autocrine and paracrine factors. 
Unfortunately, their use as pharmaceutical 
drugs is made difficult by their poor pharma¬ 
cokinetic profiles; they are easily proteolyzed, 
poorly transported, and rapidly excreted. Al¬ 
though modern formulation techniques have 
improved delivery of peptides (e.g., inhalation 
of insulin), there remains a need for small po¬ 
tent molecules that can be administered 
orally. 

For these reasons, much effort has been ex¬ 
pended to find ways to replace portions of pep¬ 
tides with non-peptide structures, called pep¬ 
tidomimetics, in the hope of obtaining orally 
bioavailable entities. Several types of peptido¬ 
mimetics have been developed, and the field 
has emerged as one of the important ap¬ 
proaches to drug design and discovery. This 
review will describe the various methods de¬ 
veloped to design peptidomimetics. Due to 
space limitations, the biological rationale for 
each peptidomimetic and its chemical synthe¬ 
sis can not be covered. Selected examples of 
the strategies employed to obtain peptidomi¬ 
metics are provided to illustrate the breadth of 
research in this field. 


2 CLASSIFICATION OF 
PEPTIDOMIMETICS 

The term peptidomimetic is often used in the 
literature to indicate a multitude of structural 
types that differ in fundamental ways. Com¬ 
parisons between peptidomimetics suffer 
from the lack of accepted definitions of what a 
peptidomimetic is (l).The term is often ap¬ 
plied to highly modified analogs of peptides 
without distinguishing how these differ from 
classical analogs of peptides. For example, 
peptide (2) is derived from the decapeptide 
LH-RH (1 )(2) contains only five amino acids, 
none of which is present in the parent com¬ 
pound, yet it is a powerful antagonist of the 
LH-RH receptor (Fig. 15.1) (2).Is (2)a peptide 
analog or a peptidomimetic? 

In the 1970s, Hughes et al. were the first to 
show that two very different chemical struc¬ 
tures have similar agonist properties (3).The 
opioid natural product, morphine (3), was 
found to resemble the N-terminal structure of 
the endogenous opioid peptides, enkephalins, 
(4a) and (4b), and j3-endorphin (5) (Fig. 15.2). 
The remarkable similarity between the mor¬ 
phine phenol system and the iV-terminal ty¬ 
rosine residue in the peptide opioids implied 
that these units reacted with opioid receptors 
in a similar fashion to elicit comparable re¬ 
sponses (4-6). 

The realization that a non-peptide natural 
product was mimicking the action of a natural 
peptide effector led Farmer to postulate that 
other non-peptide structures might be found 
that would mimic other peptide effectors (7). 
His concept of “peptide mimetic," which later 
was called "peptidomimetic," proposed that 


pGlu-His-Trp-Ser-Tyr-Gly-Leu-Arg-Pro-Gly-NH 2 

LH-RH 

( 1 ) 


Figure 15.1. Reduced-size antag¬ 
onist of LH-RH. 


(4-fluorophenyl)propionyl-D1Nal-/VMeTyr-DLys(Nic)-Lys(lsp)-DAIa-NH 2 

A-76154 ED 50 = 10.3 pg/ml 

( 2 ) 



2 Classification of Peptidomimetics 


635 


Met'enkephalin Tyr-Gly-Gly-Phe-Met 

(4a) 

LeU'Enkephalin Tyr-Gly-Gly-Phe-Leu 

(4b) 


^-Endorphin 


Tyr-Gly-Gly-Phe-Met-Thr-Ser-Glu-Lys-Ser- 

Gln-Thr-Pro-Leu-Val-Thr-Phe-Lys-Asn-Ala- 

lle-lie-Lys-Asn-Ala-Tyr-Lys-Lys-Gly-Glu 

(5) 



Morphine 

(3) 


Figure 15.2. Examples of peptidic and non-peptidic opioid receptor ligands. 


novel scaffolds could be designed to replace 
the entire peptide backbone while retaining 
isosteric topography of the enzyme-bound 
peptide (or assumed receptor-bound) confor¬ 
mation. Farmer's definition went beyond sim¬ 
ple replacement of amide bonds and the con¬ 
cept of stringing together conformationally 
restricted amino acid derivatives to mimic the 
native peptide structure. In the intervening 
years, many non-peptide and partially peptide 
structures have been found that mimic (or an¬ 
tagonize) the action cf the peptide ligand at its 
receptor; this is particularly true with sub¬ 
stances active at G-protein-coupled receptors. 

The pyrrolinone unit (6)designed by Smith 
and Hirschmann illustrates a modern use of 
these two concepts (Fig. 15.3) (8).Pyrrolino- 
nes constrain the peptide-like side-chains into 
an extended /3-structure topography that fits 
the active sites of most peptidases; pyrrolino- 
nes are resistant to normal proteolysis be¬ 
cause no a-amino acid units remain, and the 
units impart sophisticated partitioning prop¬ 
erties to the final inhibitor. Pyrrolinones, like 
many peptide-derived peptidomimetics, retain 
an atom-to-atom correspondence to the par¬ 
ent peptide, especially with respect to the pep¬ 
tide backbone structure. Most of these struc¬ 
tures contain elements that accomplish one of 
two objectives: they replace amide bonds with 
metabolically stable units, and they affect a 
conformational constraint on peptides or on 
the peptide replacement. In contrast, hetero¬ 
cyclic natural products or screening leads that 
bind to peptide receptors also have been called 
peptidomimetics by virtue of their mimicking 
(or antagonizing) the function of the natural 
peptide. Although structural data confirming 
mimicry of the designed mimetics are rarely 
available for receptor bound ligands, ample ev¬ 


idence is available from X-ray crystallography 
that heterocyclic inhibitors are mimicking the 
extended /3-strand of enzyme-bound sub¬ 
strate-derived inhibitors (vide infra). 

Based on these considerations, four dis¬ 
tinct types of peptidomimeticshave been iden¬ 
tified to date (9, 10). The first invented were 
structures that contain one or more mimics of 
the local topography about an amide bond 
(amide bond isosteres). Strictly speaking, 
these are properly classified as pseudopeptides 
( 11 ), but in recent years, they have been called 
peptidomimetics on occasion. For historical 
reasons, we classify the peptide backbone mi¬ 
metics as type I mimetics (Table 15.1). These 



Peptide 



Figure 15.3. Correlationofpyrrolinone-basedpep- 
tidomimetics and the parent peptide. 
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Table 15.1 Peptidomimetic! y p e s 


Peptidomimetic 



Examples 

Type I 

Peptide backbone mimetics 

Substrate-based design 

Pseudopeptides 

Type II 

Functional mimetics 

Molecular modeling, HTS 

GPCR antagonists 

Type III 

Topographical mimetics 

Structure-based design 

Non-peptide protease 
inhibitors 

Type IV 

Non-peptide peptidomimetics 

Group Replacement 
Assisted Binding 

Piperidine inhibitors 


mimetics often match the peptide backbone 
atom-for-atom while retaining functionality 
that makes important contacts with binding 
sites. Some units mimic short portions of sec¬ 
ondary structure (e.g., /3-turns) and have been 
used to generate lead compounds. Many early 
protease inhibitors were designed from tran¬ 
sition state analog mimetics or from collected 
substrate/product mimetics, each designed to 
mimic reaction pathway intermediates of the 
enzyme-catalyzed reaction. These are mimics 
of the peptide bond in a putative transition 
state or product state and will be classified 
here as peptidomimetics. 

The second type of mimetic to emerge was 
the functional mimetic, or type II mimetic, 
which is a small non-peptide molecule that 
binds to a peptide receptor. Morphine was the 
first well-characterizedexample of this type of 
peptidomimetic. Initially, type II mimetics 
were presumed to be direct structural analogs 
of the natural peptide, but characterization of 
both the endogenous peptide and antagonist's 
binding sites by site-directed mutagenesis has 
raised doubts about this interpretation (12). 
The mutagenesis data indicate that antago¬ 
nists for a large number of receptors seem to 
bind to receptor subsites different than those 
used by the parent peptide. Consequently, 
functional mimetics may not mimic the struc¬ 
ture of the parent hormone; this remains to be 
determined. Despite this uncertainty, the ap¬ 
proach has been quite successful and produced 
a number of potential drug lead structures. 

Type III mimetics represent the Farmer 
definition ofpeptidomimetics in that they pos¬ 
sess novel templates, which appear unrelated 
to the original peptides but contain the essen¬ 
tial groups, positioned on a novel non-peptide 
scaffold to serve as topographical mimetics. 
Several type III peptidomimetic protease in¬ 
hibitors have been characterized where direct 


X-ray structural determination of both the 
peptide-derived inhibitor and the heterocyclic 
non-peptide inhibitor complexes have been 
compared. These examples demonstrate that 
alternate scaffolds can display side-chains so 
that they interact with proteins in fashion 
closely related to that of the parent peptide. 

Recently, a fourth type of peptidomimetic 
called a GRAB-peptidomimetic (groupreplace- 
ment-assisted binding) has been identified 
(10). These structures might share structural- 
functional features of type I peptidomimetics, 
but they bind to an enzyme form not accessible 
with type I peptidomimetics. 

Previous reviews on peptidomimetics have 
addressed pseudopeptides (11), macrocyclic 
mimetics (13), natural product mimetics (14), 
cyclic protease inhibitors (15), mimetics for re¬ 
ceptor ligands (16-22), and earlier general 
overviews (23-29). This review will focus on 
the design process itself. Novel peptidomimet¬ 
ics in which the structural relationship be¬ 
tween parent peptide and the peptidomimetic 
has been established by biophysical methods 
are used to clarify the principles. Successful 
approaches are highlighted to illustrate how 
these concepts are currently used. 

3 DESIGN OF CONFORMATIONALLY 
RESTRICTED PEPTIDES 

Peptide derivatives that contain conforma- 
tionally restricting amino acid units or other 
conformational constraints were first called 
conformationally constrained (or restricted) 
peptide analogs. The use of steric hindrance or 
cyclization to limit rotational degrees of free¬ 
dom in biologic ally active molecules has a long 
history and was originally applied to non-pep¬ 
tide neurotransmitters (30). Subsequently, it 
was applied to amino acid substituents and to 





3 Design of Conformationally Restricted Peptides 


637 



TRH 

(7) 


Figure 15.4. Structure of TRH tripeptide. 

cyclic peptides (31,32) and to control second¬ 
ary structure in model proteins. 

Conformational restriction is a very power¬ 
ful method for probing the bioactive confor¬ 
mations of peptides. Small peptides have 
many flexible torsion angles so that enormous 
numbers of conformations are possible in so¬ 
lution. For example, a simple tripeptide such 
as thyrotropin-releasing hormone (TRH; 7) 
(Fig. 15.4) with six flexible bonds could have 
over 65,000 possible conformations. The num¬ 
ber of potential conformers for larger peptides 
is enormous, and some method is needed to 
exclude potential conformers. Modem bio¬ 
physical methods, e.g., X-ray crystallography 
or isotope edited nuclear magnetic resonance 
(NMR), (33) can be used to characterize pep¬ 
tide-protein interactions for soluble proteins, 
but most biophysical methods cannot yet de¬ 
termine the conformation of a ligand bound to 
constitutive receptors, e.g,, G-protein-cou¬ 
pled receptors (34, 35). 

Cyclization is one of the earliest techniques 
applied to design peptidomimetics. Cyclic pep¬ 
tides are more stable to amide bond hydrolysis 
and allow less conformational flexibility; con¬ 
sequently, the resulting analogs are antici¬ 
pated to be more selective and less toxic. Meth¬ 
ods for restricting conformations include 
peptide backbone cyclization, disulfide bond 
formation, side-chain cyclization, and metal 
ion chelation. 

The first successful application of confor¬ 
mational restriction to peptide chemistry was 
carried out by Veber et al. at Merck, (36), who 
were trying to simplify the structure of soma¬ 
tostatin ( 8 ) (Fig. 15.5) to produce an orally 
active derivative. Their approach was to intro¬ 
duce conformational restraints into the mac- 


rocyclic peptide ring system to reduce the 
number of conformations available to the an¬ 
alog. Not all substitutions were expected to 
produce biologically active products, but those 
that retained activity were assumed to be able 
to adopt conformations close to the normal 
bioactive conformation. This work began from 
the earlier discovery by Rivier et al. (37) that 
replacement of L-tryptophan in the position-8 
of somatostatin by D-tryptophan produced an 
analog that retained biological activity. This 
unusual biological result is possible when a 
D,L-sequence (D-Trp-Lys) replaces an L,L-se- 
quence (Trp-Lys) in a peptide at a type II 
p-turn, because the topography of the amino 
acid side-chains at these positions is essen¬ 
tially identical in these turns (38). These re¬ 
sults led Veber et al. to postulate that the 
amino acid sequence Phe-Trp-Lys-Thr might 
be part of a type II j3-turn, and that this tet- 
rapeptide sequence might comprise the active 
pharmacophore. Although this hypothesis was 
highly speculative for its time, it was shown to 
be essentially correct by applying the principle 
of conformational restriction (Fig. 15.5). Dele¬ 
tion of the N-terminal dipeptide, followed by 
insertion of the D-Trp at position-8, and re¬ 
placement of the disulfide sulfurs with car¬ 
bons produced analog (9). NMR and other 
data suggested that the two Phe side-chains 
were clustered, thus they were replaced by a 
transannular disulfide bond limiting the avail¬ 
able conformation, as in compound (10). After 
several iterations of this process, a biologically 
active cyclic hexapeptide (II) was discovered 
that retained only 6 of the original 14 amino 
acids in somatostatin yet produced a fully ac¬ 
tive derivative (31). 

The work of Veber et al. established that 
valuable information about the bioactive con¬ 
formation of a flexible peptide could be ob¬ 
tained by applying the principles of conforma¬ 
tional restriction, and several additional 
examples soon were reported that followed 
this strategy. Conformationally restricted en¬ 
kephalin analogs, e.g., 02-13), were formed 
by cyclizing between positions 2 and 5 of en¬ 
kephalins (4a-b) (39). Cyclization of a-mela- 
notropin (14) gave the unusually active analog 

(15) (40). Small cyclic analogs of endothelin 

(16) (41) have been discovered by applying 
these methods, as illustrated by (17) (Fig. 
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Figure 15.5. Conformationally restricted somatostatin analogs. 
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H2N-Tyr-Gly-Gly-Phe-Leu-OH 

Leu-enkepha!/n 

(4a) 



( 12 ) 


Ac-HN-Ser-Tyr-Ser-Met-Glu-His-Phe-Arg-Trp-Gly-Lys-Pro-Val-CONH 2 

a-Melanotropin 

(14) 



H 2 N-Tyr-Gly-Gly-Phe-Met-OH 

Met-enkephalin 

(4b) 



H 2 N-Cys-Ser-Cys-Ser-Ser-Leu-Met 


HO-Trp-lle-lle-Asp-Leu-His-Cys-Phe-Tyr-Val-Cys-Glii-Lys-Asp 

Endothelin 

(16) 


t 


O 



\ 


(17) 


Figure 15.6. Cyclic hormone peptide analogs. 


15.6). Peptide chemists routinely apply con¬ 
formational restriction in their attempts to 
determine possible bioactive conformations. 

Flexible peptides can be conformationally 
restricted by a variety of methods other than 
macrocyclization of the peptide. For example, 
Marshall et al. introduced a-methyl amino 
acid substituents into peptides as a way to de¬ 
crease the conformational space available to 
the resulting peptide (42); these types of ap¬ 
proaches led to his "Active Analog" approach 
for determining bioactive conformations of 
flexible molecules (43).Some other traditional 


modifications of the peptide substrate are the 
replacement of the amino acids of the Pj-P/ 
cleavage site by D-amino acids or the employ¬ 
ment of a-C or a-N alkylated amino acids and 
cyclic or /3-amino acids (Fig. 15.7). 

Mimicking the secondary structure of pep¬ 
tides has become one of the most important 
tools for rational drug design (44-47). These 
methods induce the synthetic analog to adopt 
a set of target conformations, which are de¬ 
signed to mimic the bioactive conformation 
predicted in the native substrate from bio¬ 
physical techniques. Molecular surrogates 
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Figure 15.7. Representative amino acid mimetics. 


have been found that efficiently mimic turns, 
strands, sheets, and helices. By far, the major 
efforts have focused on the design of /3-tum mi¬ 
metic-Some of the templates used to constrain 
the conformational torsion angles of the peptide 
chain are summarized in Figs. 15.8-15.14. 

In a very early example, Freidinger et al. 
developed a series of cyclic lactams that stabi¬ 
lized j8- and y-turn structures in linear pep¬ 
tides (Fig. 15.8). This strategy was applied to 
determine conformations of LH-RH that are 
consistent with the turn structure permitted 
by the constraint. For example, the 3-amin- 
olactam (18)was used to mimic a p-turn con¬ 
formation. When inserted in LH-RH, com¬ 


pound (19) retained good biological activity so 
that the bioactive conformation of LH-RH 
probably contains a /3-turn around residues 6 
and 7 (48). 

Conformational restriction has also been 
used to determine the bioactive conformation 
of enzyme-inhibitor systems for which no X- 
ray crystal structure is available. Thorsett et 
al. (49) synthesized conformationally re¬ 
stricted bicyclic lactam derivatives of the an¬ 
giotensin converting enzyme (ACE)inhibitors 
enalapril (20) and enalaprilat (21) (Fig. 15.9) 
to characterize torsion angles in the bioactive 
conformation. Analog (22) was used to con¬ 
strain the torsion angle psi (TO, Flynn et al. 



Ca(i V ) 


/8-Turn 


( 18 ) 




Arg-Pro-Gly-NH 2 



LH-RH p-turn mimetic 

(19) 


Figure 15.8. y-Lactam analog of LH-RH. A /3-turn mimetic. 
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Figure 15.9. Conformation- 
ally restricted ACE inhibitors. 


(50) extended this principle to prepare the 
very tight-binding tricyclic ACE inhibitor (23) 
(Fig. 15.9). 

Several other y-, 5-, and e-lactam deriva¬ 
tives have been prepared and inserted into re¬ 
ceptor antagonists or agonists. For instance, 
the thiazolidine lactam (24) (Fig. 15.10) has 


been shown to induce the desired secondary 
structure in a gramicidine S analog. Eater, it 
was used to prepare a conformationally re¬ 
stricted cyclosporin A analog (51). Several 
/3-turn and y-turn mimetics are shown in Figs. 
15.10-15.12, and many other examples are 
available in the recent literature (52-54). 



Figure 15.10. Lactams as /3-turn mimetics. 
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Figure 15.13. Structures of j3-sheet mimetics. 


vided mimetics of multiple discontinuous pro¬ 
tein surfaces (56). Over the last few years, the 
Gellman, Seebach, and Hanessian research 
groups have invented novel helical structures 
(e.g., 31,32) by use of j3-, y-, and Gpeptides (58). 

It is important to stress that even a small 
change in the structure or in a single torsional 
angle can be sufficient to dramatically modify 
the conformation of the resulting peptide. Nu¬ 
merous additional conformational constraints 
have been developed, and the reader is encour¬ 
aged to consult these reviews for additional 
examples (32, 59-63). 

4 TEMPLATE MIMETICS 

Highly functionalized molecular scaffolds 
have proven to be very successful in mimick¬ 


ing specific protein-protein interactions. In¬ 
sertion of the key pharmacophoric groups into 
a nonpeptidic framework has provided good 
inhibitors of a variety of biological targets. 

This technique has been successfully ap¬ 
plied in those biological targets where the key 
structural amino acids of the native peptide 
for peptide recognition are known. Miscella¬ 
neous examples are found in glycoprotein 
Gbllb/IIIa inhibitors (33)that mimic the RGD 
sequence (64) or in Ras-farnesyltransferase 
inhibitors (34) that mimic the CAAX sequence 
(Fig. 15.15) (65). 

An early example of this concept was devel¬ 
oped by Hirschmann et al. in the design of a 
somatostatin analog (Fig. 15.15)(55 ).Three of 
the four crucial amino acid side-chains of the 
parent peptide (Tyr, Trp, and Lys) were posi- 
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Figure 15.14. Newer templates found in helical or loop structures. 


tioned on a sugar template (35). Although 
originally designed as a somatotropin release 
inhibitory factor (SRIF) antagonist, com¬ 
pound (35)also proved to be a good Substance 
P antagonist. These sugar derivatives, as well 
as the benzodiazepine, diphenylmethane, and 
spiropiperidine scaffolds, are elements found 
in a variety of inhibitors of receptors, and have 
been designated as "privileged structures" (66). 
Thus, these common scaffolds can often provide 
a template for further optimization of a desired 
activity. Evans et al. have noted that the essen¬ 
tial surface area of biologically active peptides is 
similar to the surface area of benzodiazepines, 
one type of non-peptide scaffold known to bind 
to Gprotein-coupled receptors (67). 

The quest for functionalized lead struc¬ 
tures that effectively mimic the "hot spots" 
within the biological ligand is not easy (68). 
Molecular modeling and high-throughput 
screening (HTS) are techniques that are cur¬ 
rently used for this purpose and have been 
summarized elsewhere. 

The design and synthesis of antifungal an¬ 
alogs of the cyclic peptide rhodopeptin (36) 
(Fig. 15.16) illustrate a recent application of 
peptidomimetic scaffolding, where the struc¬ 
ture of the biological target is not known. Af¬ 
ter structure-activity relationship (SAR) stud¬ 
ies, the important side-chains of the peptide 


ligand were identified; then, NMR and molec¬ 
ular modeling techniques were used to model 
these side-chains onto known scaffolds and to 
compare with the original three-dimensional 
(3D) structure of the native peptide. Com¬ 
pound (37) (Fig. 15.16) is a potent peptidomi¬ 
metic derivative with improved solubility in 
water that functions the same as the cyolic 
tetrapeptide (69, 70). 

5 PEPTIDE BOND ISOSTERES 

The replacement of amide bonds by retro-in- 
verso amide replacements (71, 72) and other 
amide bond isosteres generates pseudopep¬ 
tides (11). This process was first used to stabi¬ 
lize peptide hormones in vivo , and later to pre¬ 
pare transition state analog (TSA) inhibitors. 
Systematic efforts to convert good in vitro in¬ 
hibitors into good in vivo inhibitors became 
the driving force for further development of 
peptidomimetics. Figure 15.17 illustrates 
some of the peptide backbone modifications 
that have been made in an effort to increase 
bioavailability. Replacement of scissile amide 
(CONH) bonds with groups insensitive to hy¬ 
drolysis (e.g., CH 2 NH) has been extensively 
practiced. Reviews of this work have appeared 
(11,73). Removal of the proton donors and 
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Somatostatin 


(35) 


Figure 15.15. Biologically active template mimetics. 


acceptors in an amide bond also reduces hy¬ 
dration, which improves the ability of the com¬ 
pounds to penetrate lipid membranes (74). 
These approaches represent important first 
steps in development of peptidomimetics. 
However, removal of an amide bond also af¬ 


fects the geometry and increases the flexibility 
of the molecule at this position, which de¬ 
creases ligand binding. Effective analogs have 
been obtained when conformational restric¬ 
tion, transition-state analog design, and 
amide bond replacements have been applied to 


HCI NH 2 



Figure 15.16. Rhodopeptin analogs. Representative example of scaffolding methodology. 
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R2 O R2 O R2 

X = NH, O, S, CH 2 X = O, S Retroinverso 


Figure 15.17. Isosteres that replace peptide backbone amide groups to generate pseudopeptides. 


scaffolds with molecular weights below 500- 
600 (75, 76), but at present this process is very 
labor intensive. 

6 FROM TRANSITION-STATE ANALOG 
INHIBITORS TO NON-PEPTIDE 
INHIBITORS: EXAMPLES IN PROTEASE 
INHIBITORS 

Many peptidomimetics derived from the de¬ 
sign of TSA inhibitors, molecules designed ac¬ 
cording to the hypothesis provided by Pauling 
(77) and implemented by Wolfenden (78, 79). 
TSA protease inhibitors are stable analogs of 
the tetrahedral intermediate for peptide bond 
hydrolysis that inhibit the enzyme (Fig. 
15.18). The first successful commercial appli¬ 
cation was the development of captopril (38) 
by Ondetti et al. (80), and many applications 
have been reported over the past quarter cen¬ 
tury. 

Figures 15.19-15.32 list examples of ana¬ 
logs of peptidyl transition states that have 
been employed to develop inhibitors of four 
classes of peptidases (81, 82). These units are 
used to replace the scissile amide bond in a 
substrate sequence with either an amino acid 
or dipeptide isostere, or with a chelating moi¬ 
ety in the case of metallo peptidases. The 


H 



Glu 


Tetrahedral intermediate 


Bend cleavage 



Transition state analog 

I 

N) bend cleavage 

Figure 15.18. TSA inhibi t peptide bond hydrolisis. 
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Figure 15.19. TSA used to inhibit aspartic peptidases. 


dipeptide TSA provides the functionality that 
interacts tightly with the enzyme catalytic 
groups while the amino acid sequence up- and 
downstream on the peptide chain provides in¬ 
teractions that lead to selective inhibition of 
the target enzyme. The enzyme active site typ¬ 
ically is buried in a cleft capable of accommo¬ 
dating up to three to nine amino acid residues 
cf the substrate/inhibitor depending on the 
minimum amino acid sequence necessary for 
hydrolysis. The inhibitor's exquisite selectiv¬ 
ity derives from the interactions of the li¬ 
gand’s p,-P,' residues with the enzyme bind- 
ingsites (S 6 -S 3 ') (83).Recently, some aspartic 
and serine peptidase inhibitors have been 
found that access an additional binding site 
sub-pocket (S 3 sp ) to increase both inhibitor po¬ 
tency and selectivity (84-86). 

6.1 TSA in Aspartic Peptidase Inhibitors 

The reduced amide isostere (39), developed by 
Szelke, and the statine (hydroxylmethylene) 
isostere (40) were early transition-state ana¬ 
logs used to design inhibitors of various aspar¬ 


tic proteases, (87-89), and their success led to 
other tetrahedral intermediate mimics such as 
the hydroxylethylene (41) and hydroxyethyl- 
amine (42)isosteres (Fig. 15.19) (90-92).The 
statine subunit, which mimics the tetrahedral 
intermediate, represents one of the earlier ex¬ 
amples of TSA, although statine is one atom 
short in backbone length to be a true dipeptide 
or two atoms too long to be an isosteric re¬ 
placement for a single amino acid. 

Early work focused on developing inhibi¬ 
tors of renin as potential antihypertensive 
agents, but these compounds failed to become 
drugs primarily because of difficulties in ob¬ 
taining orally active drugs. As a result, the 
first pharmaceutical attempts to develop renin 
inhibitors for treatment of hypertension 
through TSA-based inhibitors failed (93). It 
was eventually realized after extensive modi¬ 
fications to the ancillary peptide functionality 
that developing bioavailable peptide-derived 
inhibitors critically depended on the molecu¬ 
lar weight of the inhibitor. Developing inhibi¬ 
tors for HIV protease was substantially easier 
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/?-Secretase cleaves APP at: 

-Ser-Glu-Val-Lys-Met-r-Asp-Ala-Glu-Phe-Arg- 

-Ser-Glu-Val-Asn-LeuT--Asp-Ala-Glu-Phe-Arg- 


hkN-Lys-Thr-Glu-Glu-lle-Ser-Glu-Val-Asn-HN 



IC50 =■ 30 nM 
(43) 



(44) 


Ala-Glu-Phe-OH 



OH = 

K, = 2.5 nM 

(45) 

Figure 15.21. Peptide-derivedTSA inhibitors as /3-secretase inhibitors. 


protease inhibitors now in clinical use (Fig. 
15.20) have excellent oral bioavailability and 
establish that application of the transition 
state analog design process can be very suc¬ 
cessful in favorable cases. 

More recently, the principles for designing 
inhibitors of aspartic proteases have been ap¬ 
plied to the design of inhibitors of /3-secretase 
(B ACE or Memapsin-2) as potential agents for 
treating or preventing Alzheimer's disease 
(95, 96). Both statine-derived inhibitors (43) 
and hydroxyethylene-derived BACE inhibi¬ 
tors have been reported (Fig. 15.21) (97,98). A 
crystal structure of (44) bound to /3-secretase 
has been reported (99). As expected, the hy¬ 


droxyl group is hydrogen bonded to Asp32 and 
Asp228, like in other hydroxy ethylene deriva¬ 
tives, and the inhibitor binds in an extended 
conformation. Because the target /3-secretase 
is within the CNS, successful inhibitors have 
to penetrate the brain blood barrier readily, a 
property not yet achieved with any of the pep- 
tidomimetic inhibitors currently available. 

With the crystal structure in hand, struc¬ 
ture-based modification of the parent lead 
compound has just started to provide new pep- 
tidomimetic structures with lower molecular 
weight and fewer hydrogen bonds (e.g., 45) 
(Fig. 15.211, opening further avenues to phar¬ 
macologically useful compounds (100). 
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Captopril 

(38) 



Enalapril, R = Et 
Enalaprilat, R = H, (46) 


Figure 15.22. Examples of 
TSA as ACE inhibitors. 



(47) 


6.2 TSA in Metallo Peptidase Inhibitors 

The discovery of the angiotensin converting 
enzyme inhibitors in the middle 1970s consti¬ 
tutes one of the major advances in the rational 
design of drugs, the consequences of which are 
still being realized. The discovery of these me¬ 
tallo peptidase inhibitors was carried out by 
Ondetti et al. as part of a long-term study to 
develop antihypertensive drugs (80); in 1999 
they received the Lasker Prize in Clinical 
Medicine for their work. 

Angiotensin converting enzyme (ACE) is a 
carboxy zinc metallo dipeptidase that cleaves 
His-Leu from the C-terminus of angioten- 
sin-I. Ondetti et al. reasoned that the prod¬ 
uct of normal reaction, the carboxyl group, 
could bind to the active site zinc ion, and 
that the carboxyl group of a collected-prod¬ 
uct inhibitor also could bind weakly. To im¬ 
prove the interaction between inhibitor and 
enzyme zinc ion, they replaced the carboxyl 
group with a sulfhydryl group, which binds 
zinc about 1000 times more tightly. This led 
to captopril (Capoten) (38)(Fig. 15.22) (80). 
Later developments by other companies led 
to many ACE inhibitors. Some are illus¬ 


trated by enalaprilat (46) and lisinopril (47) 
(Fig. 15.22) (101,102). 

Most metallopeptidase inhibitors append a 
zinc chelating functionality to a peptide or 
peptidomimeticthat is recognized by the SP¬ 
SS' subsites in the target enzyme. Successful 
clinical candidates invariably contain groups 
that replace the initial di- and tri-peptide moi¬ 
eties to achieve selectivity and orally activity. 
For example, neutral endopeptidase (NEP), 
another endopeptidase involved in degrading 
the larger opioid peptides dynorphan and/or 
endorphan, is inhibited by thiorphan (48) 
(103) and a variety of NEP inhibitors: retro- 
thiorphan (49) (104) and kelatorphan (50) 
(Fig. 15.23) (105).The hydroxamicacid moiety 
is used in many inhibitors of metallopepti- 
dases. 

Inhibition of NEP also prevents the degra¬ 
dation of atrial natriuretic factor (ANF), a nat¬ 
ural hypotensive peptide. Dual inhibitors of 
NEP and ACE have been designed success¬ 
fully because both enzymes share significant 
structural homology, particularly in their ac¬ 
tive sites. Simultaneous inhibition of both 
peptidases produces a more powerful hypoten- 
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(48) 




Figure 15.23. Examples cfTSA as NEP inhibitors. 

sive response (106, 107). Several dual inhibi¬ 
tors are in phase III clinical trial for treating 
hypertension (Fig. 15.24). Omapatrilat (51, 
BMS-189921) is the furthest along as of late 
2001 (105). 

Matrix metalloproteases (MMP) are also 
inhibited by hydroxamic acids and/or thiols. 
Over 25 variants of these enzymes are known, 
and some are involved in diseases ranging 
from inflammation to metastatic cancer (108). 
MMPs contain a zinc ion in the active site and 
function through the metallopeptidases cata¬ 
lytic mechanism already discussed. However, 
subtle differences between enzymes enable se¬ 
lective inhibitors to be developed (109). Fig. 
15.25 lists some of the reported MMP inhibi¬ 
tors that use carboxylic acid (52-53), a hy¬ 
droxamic acid (54-55), or thiol groups (56)as 
metal chelators. 



NEP IC 50 = 9 nM 

(51) 



Sampatrilat 
ACE IC 50 = 7 nM 
NEP IC 50 = 20 nM 



ACE IC 50 = 25 nM 
NEP IC 50 = 3 nM 


Figure 15.24. Examples of TSA as dual ACE/NEP 
inhibitors. 


Other reported zinc binding chelators used 
in matrix metalloproteinase inhibitors are 
summarized in Fig. 15.26. For instance, one of 
the oxygens in the phosphonamide (57) binds 
strongly to the zinc ion, whereas the other one 
coordinates weakly with the metal (110). More 
recently, "suicide substrate" MMP inhibitors 
have appeared (58)(Fig. 15.26) (111).The se- 
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Figure 15.25. Traditional TSA used to inhibit metallopeptidases. 


lectivity of this type of compound arises from 
the specific coordination of the thiirane with 
the active-site zinc ion, which facilitates thi¬ 
irane ring opening by nucleophilic attack by 
neighboring Glu-404. This novel mode of bind¬ 
ing was assessed by X-ray absorption studies 
because of the difficulty to obtain a suitable 
crystal structure (111,112). 

ADAMs are membrane proteins that con¬ 
tain a disintegrin and a metalloprotease do¬ 
main. Disintegrins are RGD-containing pro¬ 
teins that inhibit cell/matrix interactions 
(adhesion) and cell/cell interactions (aggrega¬ 
tion) through the integrin receptors. In addi¬ 
tion, ADAMs have two other domains that are 
involved in signaling and transport (113). 

There are more than 25 ADAMs proteases 
identified so far. ADAM 17 has been shown to 
be TNF-a converting enzyme (TACE) (114). 
Inhibition of TACE slows the production of 
TNF-a, a potent cytokine involved in inflam¬ 
matory responses to infection. Normally 
TNF-a produces a useful response, but in 
some cases, too much TNF-a is released and 
inhibition of TNF-a production would be ther¬ 


apeutically useful. Synthetic analogs have 
been synthesized that inhibit this enzyme. 
Clinical candidates like Ro-32,7315 (59) (Fig. 
15.27) are starting to emerge, and more are 
expected in the near future (115,116). 

Aminopeptidases, enzymes that cleave off 
the N-terminal amino acid from a peptide 
chain, are bismetallo peptidases, a class of 
metallopeptidase that contain two metals ions 
in the catalytic site (117, 118). These can be 
inhibited by compounds related to bestatin 
(60) (Fig. 15.28), which contains the N-termi¬ 
nal a-hydroxy-/3-amino acid residue, some¬ 
times referred to as norstatine. In leucine 
amino peptidase, chelation occurs between 
both the amide carbonyl group and the adja¬ 
cent hydroxyl and the hydroxyl and the N-ter- 
minal amino group (119,120). 

6.3 TSA-Derived Cysteine and Serine 
Peptidase Inhibitors 

Classical TSA inhibitors of cysteine and serine 
proteases differ from the metallo- and aspartic 
protease inhibitors in that they mimic the tet- 
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Figure 15.26. Novel TSA used to inhibit metallopeptidases. 



Figure 15.27. Example of TSA as an TNF-a inhibitor. 


rahedral intermediates for enzyme-catalyzed 
amide bond hydrolysis only after a reversible 
chemical reaction between enzyme and inhib¬ 
itor takes place. Usually this involves the ad¬ 
dition of the enzyme catalytic nucleophile (the 
serine protease hydroxyl group or the cysteine 
protease thiol group) to an electrophilic group in 
the inhibitor to generate ketal-like species (121). 

Some of the serine and cysteine TSA moi¬ 
eties are shown in Fig. 15.29. Selective inhibi¬ 
tion between these two classes of protease can 
be achieved easily. For example, trifluorom- 
ethylketones ( 61 ) and peptidyl boronic acids 
(62) do not efficiently inhibit cysteine pro¬ 
teases. However, selective inhibition of en¬ 
zymes within each class can be very difficult. 
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Bestatin 

(60) 


Figure 15.28. Proposed binding mode of bestatin. 

Many cysteine peptidases are involved in 
the biosynthesis and degradation of biologi¬ 
cally important peptides. Most early work was 
done with papain, a cysteine peptidase iso¬ 
lated from the papaya fruit and used in meat 
tenderizer many years ago. The readily avail¬ 
able source of this enzyme led to one of the 
very first X-ray crystal structures of any pep¬ 
tidase (122, 123), despite the fact that no 
cysteine peptidase was then known to be im¬ 
portant in human pathology. Since then, 
cathepsins B, H, L, and S were discovered to be 
involved in biosynthetic steps in human im¬ 
mune response, inflammation, and other biol¬ 
ogies. For example, cathepsin B is clearly in¬ 
volved in the metastatic process and must act 
at some stage to permit transformed tumor 
cells to migrate to other parts of the body; for 
20 years, people have sought inhibitors of ca¬ 


thepsin B as potential anti-metastatic drugs 
(124). Cathepsin K was recently discovered 
and shown to be involved in osteoporosis and 
bone regulation (125). 

Inhibitors of cathepsin K illustrate the 
principles developed to inhibit this class of en¬ 
zyme. This enzyme sequence was detected in 
1994 by sequencing of human DNA for the 
human genome project ( 126). Cathepsin K was 
found to be inhibited by leupeptin (63) and by 
compound (64), which surprisingly binds 
"backwards" to the active site (Fig. 15.30). A 
hypothesis to develop symmetrical inhibitors 
of cathepsin K derived from the superposition 
of both aldehydes on the carbonyl carbon; this 
led to the diamino ketone TSA (65). The di¬ 
amino ketone moiety seems to work in several 
classes of cysteine proteases (127). 

Based on these results, Marquis et al. have 
recently described the design and synthesis cf 
conformationally constrained cyclic ketones 
as highly potent and selective cathepsin K in¬ 
hibitors (66-67) (Fig. 15.31) (128). The labile 
stereogenic group in position a of the ketone 
was shown to be important for the binding 
mode and pharmacokinetic profile of these 
type of inhibitors. The crystal structure of the 
two epimers showed two alternate directions 
of binding to the enzyme active site. In both 
structures, the primed region of the enzyme 
was occupied by these inhibitors. Further in¬ 
vestigation, resulted in the azepanone deriva¬ 
tive (68) as a configurationally stable template 
for the selective inhibition of this cysteine pro¬ 
tease (K { = 4.8 p M) (129). 



Trifluoromethylketone Boronic add Diaminoketone 

(61) (62) 



Phosphonic add a-Ketoamide 


Figure 15.29. TSA used to inhibit serine or cysteine peptidases. 
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Figure 15.30. Structure-based design of cathepsin K inhibitors. 


Caspases are involved in a variety of cell 
functions, especially in programmed cell death 
(apoptosis). These enzymes recognize tet- 
rapeptide sequences ending in an aspartic acid 
recognition point: X-Y-Z-Asp-NHR. Much ef¬ 
fort has been expended in trying to obtain se¬ 
lective inhibitors of the 14 different types 
identified to date. In this context, selective in¬ 
hibitors of caspase 1 or of caspase 3/7 have 
recently been reported (130). 

Peptidomimetic modifications of the tet- 
rapeptide sequence have led to the conforma- 
tionally constrained compound (69)as a selec¬ 
tive inhibitor of caspase-1 or interleukin-1/3 
converting enzyme (ICE) as potential anti-in¬ 
flammatory compounds (131). Recently, new 
non-peptide peptidomimetic diphenyl ether 
sulfonamides have been described as novel 
lead structures (70) (Fig. 15.32) (132). 

Finally, researchers from SmithKline 
Glaxo have identified potent and selective in¬ 
hibitors of caspases 3 and 7 that lack the re¬ 
quired carboxyl group in P x (71) (Fig. 15.32). 
The X-ray co-crystal structure reveals the for¬ 


mation of the typical tetrahedral intermediate 
of the isatin type structures, which may com¬ 
promise its selective inhibition of proteases 
(133,134). 

These reversible caspase inhibitors differ 
from inhibitors that form irreversible covalent 
bonds, the so-called "dead-end" or "suicide" 
inhibitors of these enzymes, For example, the 
a-acetoxy ketone (72)in Fig. 15.32 is an alky¬ 
lating irreversible inhibitor; the enzyme cys- 
teinyl group displaces the a-acetoxy group to 
form an irreversible covalent bond (135). 


7 SPEEDING UP PEPTIDOMIMETIC 

RESEARCH 

As mentioned before, combinatorial chemis¬ 
try, high-throughput screening, and analo¬ 
gous techniques have become powerful tools 
to promote drug discovery in peptidomimetic 
research. It is not the intention of this chapter 
to summarize all these methods, and excellent 
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Figure 15.31. Cyclic ketones in novel cathepsin K inhibitors. 



Figure 15.32. Examples of TSA as caspase inhibitors. 
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Figure 15.33. Somatostatin receptor agonists found through combinatorial chemistry. 


reviews are available in the literature (136— 
140). However, one successful approach devel¬ 
oped at Merck for the rapid identification of 
selective agonists of the somatostatin receptor 
through combinatorial chemistry should be 
highlighted, because it illustrates the evolu¬ 
tion of a constrained peptide into a non-pep¬ 
tide peptidomimetic structure (141). 

A series of combinatorial libraries were 
constructed on the basis of molecular model¬ 
ing of known peptide agonists like MK-678 
and ocreotide. A chemical collection of 200,000 
compounds was screened, giving priority to 
the residues Tyr-Trp-Lys, important pharma¬ 
cophores in somatostatin determined first by 


Veber et al. (31) This approach yielded five 
compounds (73-77) (Fig. 15.33), each being 
selective for one of the somatostatin receptor 
subtypes: sstl (73), sst2 (74), sst3 (75), sst4 
(76), and sst5 (77). 


8 TOWARD RATIONAL DRUG DESIGN: 
DISCOVERY OF NOVEL NON-PEPTIDE 
PEPTIDOMIMETICS 

Current pharmaceutical research has taken 
advantage of newer computational methods, 
the so-called computer-aided drug design, and 
other physicochemical techniques such as X* 
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Figure 15.34. Examples of 
GRAB peptidomimetics. 




ray crystallography and NMR (142). The main 
goal in rational drug design is to translate the 
structural information in the native peptide 
into low molecular weight non-peptidic mole¬ 
cules. Over the past years, many 3D structures 
of biological targets have been solved and have 
been successfully used to design new, pharma¬ 
cologically useful compounds (vide infra). Dif¬ 
ferent computer-aided design methods, e.g., 
3D pharmacophore model, 3D quantitative 
SAR (QSAR), docking, and de novo design, 
have been extensively reviewed elsewhere (75, 
143-146). 

Recently, the importance of generating in¬ 
hibitors that target receptor conformational 
ensembles has been pointed out (10). This 
method goes beyond the current docking of 
known structures to known active site con- 
formers and can lead to type III and GRAB 
peptidomimetics. 

The concept of Group Replacement As¬ 
sisted Binding (GRAB) peptidomimetics de¬ 
rives from the discovery at Roche of the piper¬ 
idine class of renin inhibitors. The non¬ 


peptide inhibitors of renin (78) (K i = 26 / uM) 
and (79) (jR^ = 31 n M) (Fig. 15.34)(84, 147- 
149) stabilize an enzyme active site conforma¬ 
tion different than the /3-strand binding en¬ 
zyme conformation typical for other peptidase 
inhibitors. A close analysis of the X-ray crystal 
structure of the enzyme inhibitor complex 
shows that the piperidine C4-phenyl group 
binds to the enzyme to replace Tyr 75 that has 
rotated to another position. Interestingly, 
Leu 73 also rotates to fill some of the vacated 
Tyr 75 pocket, and this in turn allows Trp 39 to 
occupy a new site formed in part by the va¬ 
cated Leu 73 (Fig. 15.35). This cascade of con¬ 
formational transitions in the renin active site 
allows the optimized inhibitor to stabilize an 
enzyme conformation not observed when the 
classic peptide-derived peptidomimetics bind. 
This stabilization process is defined as group 
replacement process, and the piperidine inhib¬ 
itors constitute a new type of peptidomimetic: 
GRAB peptidomimetics. 

Comparisonof (78)and (79)with the struc¬ 
tures of other peptide-derived inhibitors re- 



9 Historical Development of Important Non-Peptide Peptidomimetics 


659 



Figure 15.35. GRAB peptidomimetics in action. See color insert. 


vcnled how the different enzyme active site 
conformation were found. Bursavich et al. 
ha\e successfully extended the initial renin 
modeling to the design of inhibitors of two 
other aspartic peptidases: pepsin and R. chi- 
nensis pepsin (80) (A) = 2 p<S,)and (81 )(K { = 
0.2 pJVf) (Fig. 15.34) (150). 

The extended j3-strand binding conforma¬ 
tion could be changed into the piperidine bind¬ 
ing conformation by a series of low-energy, 
mechanistically related conformational changes 
in active site side-chains. The discovety of the 
Roche inhibitors and the correlation of these 
structures with peptide-derived inhibitors are 
analogous to a peptidomimetic "RosettaStone." 
This design strategy has the potential for de¬ 
signing novel types of peptidomimetic struc¬ 
tures. 

9 HISTORICAL DEVELOPMENT OF 
IMPORTANT NON-PEPTIDE 
PEPTIDOMIMETICS 

9.1 HIV Protease 

Type-I HIV-1 protease inhibitors, Saquinavir, 
Ritonavir, Indinavir, Amprenavir, Viracept 
(neflinavir mesilate), and Lopinavir (Fig. 
15.20) are established drugs for the treatment 
of AIDS. All these inhibitors employ the cen¬ 
tral hydroxyl transition state mimetic as a 
scaffold on which varying functionality was 
systematically added until the required bal¬ 
ance between potency, in uiuo activity and oral 
absorption was achieved. In general, the bind¬ 
ing interactions were optimized through 
iternative synthesis and co-crystallization of in¬ 
hibitor with enzyme, molecular modeling, and 
re-designing the inhibitor side-chains. Phar¬ 
macokinetic properties were addressed only 
after the initial inhibitor was identified and 
optimized. Compounds (82-83) (Fig. 15.36) 


are highly modified peptidic structures that 
stabilize the enzyme-bound extended 13 -con¬ 
formation (151,152). 

Another approach to achieve greater in 
vivo activity is to start with a molecular tem¬ 
plate with proven useful pharmacokinetics 
and oral bioavailability and to selectively mod¬ 
ify it to achieve the desired activity. Identifica¬ 
tion of the orally active anticoagulant warfa¬ 
rin (84) (Fig. 15.37) as a weak inhibitor (IC 50 
- 18 (jM) of HIV protease was followed by two 
reports of 4-hydroxycoumarins as possible 
type III HIV inhibitors. Subsequent SAR stud¬ 
ies led to the more potent 5,6-dihydro-4-hy- 
droxy-3-pyrone inhibitor (85)(IC 50 = 2.7 n M), 
which has good anti-viral activity (EC 60 = 0.5 
(lM ) and is orally bioavailable (153). Upjohn 
researchers also used a structure-based design 
approach based on warfarin to obtain ( 86 ), 
their clinical candidate PNU-140690 (154). It 
should be noted that both inhibitors bind to 
the extended j3-strand binding active site con¬ 
formation. 

Workers at DuPont used a pharmacophore 
model and database search to develop the first 
type III mimetic inhibitor of HIV protease, 
DuP 450 (87) (Fig. 15.38). This evolved from a 
3D pharnacophorethat retained two key inter¬ 
actions: replacement of the flap-bound water 
and a hydroxyl transition-state isostere (155). 
Molecular modeling led to a cyclohexanone as 
a better spacer between these groups, and fi¬ 
nally the seven-membered cyclic urea (87) was 
created (Fig. 15.38). The development of these 
inhibitors illustrates the importance of con¬ 
formational analysis in the design of con¬ 
strained analogs. 

Surprisingly, the symmetric cyclic sulfo- 
nyl-urea derivative analog (88) (Fig. 15.38, 
K i = 3 n M) binds differently in the active site 
and adopts a flipped conformation (156). 
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Figure 16.36. (3-Strand HIV protease 
inhibitors. 



Moreover, SAR of the cyclic urea and cyclic 
sulfamide inhibitors do not follow a straight¬ 
forward pattern. These contradictory results 
clearly illustrate the structural diversity cre¬ 
ated by a subtle structural modification in two 
otherwise related peptidomimetic protease in¬ 
hibitors. 

The peptidase inhibitors, (82) and (83), are 
actually amino acid and transition-state mim¬ 
ics pieced together to emulate the typical 
ligand-bound extended p-strand inhibitor con¬ 
formation. The structurally distinct heterocy¬ 
clic aspartic protease inhibitors (85-86) and 
(87-88) are non-peptide peptidomimetics be¬ 
cause of their remote structural relationship 
to native peptide substrates. Yet these two dis¬ 
tinct peptidomimetic classes bind to the same 
active site topography. These structurally dis¬ 
tinct peptidomimetics selectively stabilize 
closely related enzyme conformations. 

9.2 Thrombin 

Thrombin and Factor Xa are both serine pro¬ 
teases involved in the blood coagulation cas¬ 
cade. Inhibition of these two enzymes is pro¬ 
viding novel anticoagulants (157-159). 


The development of thrombin inhibitors 
that lack the functionalized TSA highlights a 
major new approach to type I peptidomimet¬ 
ics. In 1995, a Lilly group found that D-Phe- 
Pro-Agmatine analogs showed increased se¬ 
lectivity for thrombin over other fibrinolytic 
enzymes despite a 100-fold loss in potency 
caused by the removal of the aldehyde group 
(160). These studies paved the way for 
Merck’s development of picomolar thrombin 
inhibitors (161, 162), which use a similar mo¬ 
tif. Removal of an a-ketoamide transition 
state mimic from L-370,518 (89) (Fig. 15.39, 
Ki = 0.09 n M) led to an expected 100-fold drop 
in potency for (90)^ = 5 nM). However, sys¬ 
tematic modification of the P 3 position re¬ 
stored potency and led to an inhibitor (91) 
with a Kj = 2.5 pM. Interestingly, potency 
seems to be enhanced by a fortuitous hydro- 
phobic collapse into a favorable binding con¬ 
formation. 

Thrombin inhibitors (92) and (93) illus¬ 
trate a novel type III peptidomimetic. Most 
protease inhibitors bind in an extended 
(8-strand conformation that is stabilized by 
multiple enzyme ligand hydrogen bonds. 
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Parke-Davis X-ray structure 

(85) 


Asp 2 5 Asp 12 5 



Pharmacia-Upjohn X-ray structure Warfarin 

(86) (84) 

Figure 15.37. Warfarin analogs as non-peptide HTV protease inhibitors. 



( 88 ) 


Figure 15.38. Cyclic ureas as non-peptide HIV 
protease inhibitors. 


However, Boehringer Mannheim developed 
thrombin inhibitors (92) (Fig. 15.40) that lack 
these H-bonds (163). This idea was exploited 
by researchers at 3D Pharmaceuticals, who 
were able to crystallize (93) in the active site 
(164,165). In this example, the benzene ring 
acts as a scaffold to display the three different 
substituents to fill the three principal binding 
pockets. 

Other type III peptidomimetic inhibitors of 
thrombin have been developed from screening 
leads (166, 167) such as inhibitor (94) (Fig. 
15.40). SARled to the design of (95)Inhibitor 
(96) was derived from docking studies with 
the 5-amidino indole nucleus, followed by ad¬ 
dition of a lipophilic side-chain to interact with 
the important S 3 subsite of thrombin. The 
crystal structures of both (95) and (96) in the 
active site of thrombin shows that the aro¬ 
matic core, binds in the S x site as expected,but 



662 


Peptidomimetics for Drug Design 



(89) 



Figure 15.39. Non-TSA thrombin inhibitors. 

does not pick up hydrogen bonding from the 
important active site sequence Ser214- 
Gly216 (168). Both crystal structures showed 
a similar binding mode; where interaction of 
the C-2 side-chain with Trp60D might explain 
the high thrombin selectivity observed for this 
series (169). 

Another type III peptidomimetic inhibitor 
was derived from the crystal structure of a 
bicyclic [3.1.3] inhibitor (170) complexed to 
thrombin (97) (Fig. 15.41). The X-ray struc¬ 
ture revealed that one of the carbonyls was 
oriented towards the hydrophobic P-pocket 
(S 2 ). The desolvation necessary to place a car¬ 
bonyl in a hydrophobic pocket is unfavorable 
and various alkyl groups were used as possible 
replacements. This led to the potent CKj = 13 
n M) and selective (>760 for thrombin over 
trypsin) inhibitor (98). 


One of the major drawbacks in thrombin 
inhibitor design was the requirement for a ba¬ 
sic side-chain in P x needed to form a salt 
bridge to enzyme Aspl89. However, the other 
amino acid side-chains in S x are largely li¬ 
pophilic and neutral. This feature suggested 
that the strongly basic group in P L could be 
replaced by a weaker base or even with hydro¬ 
gen-bonding groups. Compounds (99-100) 
are representative of this strategy (Fig. 15.40) 
(171).An X-ray crystal structure of (99)shows 
a new binding mode in which the formamide 
group points out of the S 1 pocket and forms 
new hydrogen bonds with Gly219 (172). The 
ability to obtain crystal structures of throm¬ 
bin inhibitors complexes for many of the in¬ 
hibitors shown in Figs. 15.40-15.41 estab¬ 
lishes that most are type III peptidomimetics. 

9.3 Factor Xa 

New approaches to design inhibitors of Factor 
Xa as potential anticoagulants have been re¬ 
viewed (173), and important type III mimetics 
have been described (Fig. 15.42). All inhibitors 
contain amidine or basic groups that bind in 
the enzyme's S t site; none of the inhibitors 
contains a classical electrophilic center of the 
type employed in TSA inhibitors (174-180). 

Compound (101) (Fig. 15.42) was designed 
from a strategy involving connection of a 
three-point pharmacophore derived from mo¬ 
lecular modeling. Beginning with the X-ray 
structure of the Factor Xa dimer, Gong et al. 
(176) envisioned three important enzymatic 
contact points: a phenylamidine in the S x sub¬ 
site, a phenylamidine in the S 4 site, and a car- 
boxylate moiety postulated by a group at Daii- 
chi to confer selectivity over thrombin 
through an interaction with Glnl92 of Factor 
Xa. Systematic iterative modifications led to 
the potent inhibitor (lOl)^ = 9 nM)> which 
has 350-fold selectivity over thrombin. This 
approach highlights a truly de novo method 
where fragments were docked into the active 
site and an appropriate spacer was chosen to 
connect them. Further SAR data led to modi¬ 
fications that improved both potency and se¬ 
lectivity (176). 

9.4 Clycoprotein I lb/11 la (GP llb/llla) 

Some outstanding examples of the use of con¬ 
formational restriction to characterize the 
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Figure 15.40. Non-H-bonding-basedthrombin inhibitors. 


bioactive conformations of Arg-Gly-Asp pep- 
tidometic antagonists illustrate the present 
state-of-the-art. Members of the integrin fam¬ 
ily of receptors recognize and bind the peptide 
sequence, Arg-Gly-Asp, as an important step 
in platelet aggregation and other physiological 
processes (181), and competitive antagonists 
for this process could serve as potential drug 
candidates. Much effort has been directed to¬ 
ward identifying small ligands that might 
mimic the RGD peptide sequence (182). This 
drug design concept was supported by the fact 
that protein antagonists of integrin receptors 
are known that contain the RGD sequence 
(183) and that small peptide sequences con¬ 
taining the RGD moiety weakly antagonize 
the endogenous ligand (184). Consequently, 
several groups synthesized conformationally 
restricted derivatives of small peptides as 
starting points for developing metabolically 
stable peptides or peptidemimetics. Ali et al. 
(185) svnthesized a series of disulfide deriva¬ 


tives of the RGD sequence, which were de¬ 
signed by analogy with the somatostatin work 
(vide supra). Excellent antagonists related to 
( 102 ) were obtained. Further constraint of the 
peptide system by use of the o-thiol benzene 
derivatives led to the novel antagonist SKF 
107260 ( 103 ) (Fig. 15.43), a good inhibitor of 
both platelet aggregation and binding to 
GPIIb/IIIa. Barker et al. (186)followed a sim¬ 
ilar strategy but used cyclic sulfides as an ad¬ 
ditional conformationally restricting element. 
These derivatives had the advantage of being 
rapidly synthesized by solid phase methods. 
Systematic structure-activity studies with re¬ 
spect to the amino acid preceding the RGD 
sequence and the chirality of sulfoxide deriva¬ 
tives led to the discovery of G-4120 ( 104 ), a 
potent, biologically active derivative. 

The conformation of both (103)and (104) 
in water was found to be highly constrained, 
and a single predominant conformation could 
be characterized in aqueous solution by use of 



664 


Peptidomimetics for Drug Design 



Figure 15.41. Optimized and P 2 thrombin inhibitors. 


NMR methods and computational chemistry 
(185, 187). This bioactive conformation de¬ 
fined the topographical placement of the argi¬ 
nine guanidine group and the aspartic car¬ 
boxyl group, and was superimposed onto a 
conformationally restricted template of a class 
of compounds with generally suitable pharma¬ 
codynamic properties. In this case, the benzo¬ 
diazepine ring system was used, and the strat¬ 
egy generated the low molecular non-peptide 
RGD receptor antagonist (105-107) (Fig. 
15.44), which contain at least two conforma¬ 
tional restrictions, the bicyclic heterocycle and 
the acetylene linker. The compounds shown in 
Fig. 15.44 represent what can be achieved by 
applying the principles of conformational re¬ 
striction to peptides when no X-ray or NMR 
structural information are available for the 
complex between ligand and receptor. Benzo¬ 
diazepines (105-107) represent the first type 
III peptidomimetics designed de novo by sys¬ 


tematically modifying a natural receptor¬ 
binding peptide (187-189). 

A variety of other scaffolds have been devel¬ 
oped by exploiting the idea that glycine repre¬ 
sents a spacer between the two important 
recognition residues Arg and Asp. This tem¬ 
plate-based approach positions the key side 
functionality, a basic function and an acid one 
within a distance of 11-17 A, required for pre¬ 
sentation to the receptor. Several examples of 
these scaffolds are shown in Fig. 15.45 (190-195). 

Recent results suggest that the RGD trip¬ 
eptide can adopt multiple conformations that 
allow tight binding to the receptor. This theory 
is supported by the fact that nonpeptide RGD 
peptidomimetics can adopt a range of different 
topographies such as found in cupped, tum-ex- 
tended-turn, or /3-tum conformations (196). 

RGD type I peptidomimetics are usually 
poorly bioavailable compounds because of the 
presence of multiple hydrogen-bonding sites 
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Figure 15.42. Examples of FXa protease inhibitors. 


plus the charged polar functional groups at 
both ends. Esters or coumarin (197) linkers 
have been used to provide orally available pro¬ 
drugs, and bioisosteric replacements of the 
guanidiniurn group by a pyridine, (198) tetra- 
hydronaphtyridine, (199), or aminobenzimi- 
dazole (200) moieties provided more bioavail- 
able analogs. 

9.5 Ras-Farnesyltransferase 

Inhibitors of Ras-farnesyltransferase have 
been developed by mimicking the C-terminal 


CAAX motif (where C is a cysteine residue, A 
is any aliphatic amino acid, and X is usually 
Met, Ser, or Ala). This tetrapeptide is the sig¬ 
nal for famesylation of Ras proteins. Ras-far¬ 
nesyltransferase is one of the most promising 
targets for novel anti-cancer drugs, because at 
least 30% of the human cancers contain mu¬ 
tated Reis (201,202). 

Two types of peptidomimetic structures 
have been used to develop inhibitors (203). 
Some typical type I inhibitors were generated 
by replacing the amide backbone with differ- 
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Figure 16.43. Conformationally restricted RGD cyclic peptides. 



(107) 

Figure 15.44. Benzodiazepines RGD analogs. 


ent isosteres like the oxymethylene amide 
bond in (108)(Fig. 15.46, IC„ = 60 n M) (204). 
The central dipeptide segment of CA-p^X has 
been replaced with rigid linkers like the 3-ami- 
nomethylbenzoic acid (AMBA) in (109) (205). 
This novel inhibitor was not farnesylated, 
showing that the two amino acids in the mid¬ 
dle of the CAAX tetrapeptide are required for 
farnesylation. An imidazole group has been 
used to replace the thiol group of the CAAX 
motif to produce compound (110) (206). 

An outstanding example in peptidomimetic 
design evolved from these studies. Truncation 
and conformational restriction of a reduced 
isostere of the parent peptide substrate, fol¬ 
lowed by systematic replacement of the pep¬ 
tide-like side-chains provided the potent non- 
peptidic inhibitor (111) (Fig. 15.46) (207). 
This approach highlights the transition from a 
peptide-derived structure to a compound with 
no apparent resemblance to the original pep¬ 
tide. 

Recently, crystal structures of famesyl- 
transferase complexed with a farnesyl group 
donor and the native substrate or a type I pep¬ 
tidomimetic show the structural basis for in¬ 
hibition of this enzyme. The X-ray data show 
that the CAAX motif adopts an extended con¬ 
formation rather than a /3-turn, which is the 
conformation observed by transferred nuclear 
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Figure 15.45. Different scaffolds used in RGD mimetics. 


Overhauser effect experiments; coordination 
cf the cysteine side-chain to the Zn ion pro¬ 
motes the conformational change in the pep¬ 
tide backbone. Moreover, differences in the 
conformation binding mode of peptides and 
peptidomimetics is one of the bases for selec¬ 
tive famesylation (208). 

Other type III peptidomimetic inhibitors of 
this enzyme have also been reported. Inhibitor 
(112) (Fig. 15.47) was developed by replacing 
the A-[A 2 dipeptidyl sequence with a benzodi¬ 
azepine scaffold (209). Later, SAR modifica¬ 
tions of the benzodiazepine nucleus that in¬ 
cluded a hydrophilic 7-cyano group and a 
4-sulfonyl group provided the potent, orally 
available and in vivo active (113) (210). 

HTS also produced several non-peptide 
leads typified by inhibitor SCH 47307 (114) 


(Fig. 15.47) (211). Subsequent SAR work led 
to the potent inhibitor SCH 66701 (115)(A) = 
1.7 j uM), which was crystallized within the en¬ 
zyme active site (212). This series of com¬ 
pounds is completely non-peptidic and also 
lacks the free sulfhydryl or imidazole seen in 
the other inhibitors discussed here. This is a 
breakthrough that shows that potency can be 
achieved even without the "essential" cysteine 
or sulfhydryl mimic. 

9.6 Non-Peptidic Ligands for Peptide 
Receptors 

This section illustrates the successful develop¬ 
ment of non-peptide peptidomimetics from a 
screening lead by assuming the inhibitor 
binds to the receptor in the same way as does 
the native peptide hormone. These assump- 
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Figure 15.46. Peptide-like Ras-famesyltransferase inhibitors. 


tions actually led to effective inhibitors of the 
receptor. Later, site-directed mutagenesis of 
target receptors suggested that for many of 
these compounds, the mimetic was binding to 
the receptor at ancillary, perhaps overlapping, 
sites on the receptor. Later still, pharmacolog¬ 
ical studies indicated that peptide receptors 
adopted multiple states, suggesting that dif¬ 
ferent antagonists might bind to different re¬ 
ceptor forms. Of course, if compounds do not 
bind to the same receptor site as the endoge¬ 
nous hormone, SAR data collected on the nat¬ 
ural peptide substrate is not applicable to 
these antagonists. Most of these peptidomi¬ 
metics are probably type II or functional mi- 
metic~Yet the success of this approach sug¬ 
gests that at least for some non-peptide 
antagonists, there may be some congruent 
structure that interacts with the receptor. 
These issues will only be determined unam¬ 
biguously when high-resolution structures of 
the G-protein-coupled receptors (213) and 
other constitutive receptor systems are deter¬ 
mined. 

9.6.1 Angiotensin 11 . The first non-peptide 
antagonists of the ATI receptor were found by 
HTS. The imidazole (116) (Fig. 15.48) is a 


weak (IC 50 = 43 j uM) but quite selective A-Il 
receptor antagonist (214). Using this as a lead 
compound, DuPont and SmithKline Glaxo re¬ 
searchers independently developed potent 
small molecule A-Il receptor antagonists. The 
DuPont group used the conformation sug-, 
gested by Smeby and Fernandjian to guide the 
design (215). It was speculated that the car¬ 
boxyl group and the imidazol group of ( 116 ) 
were bound to the A-Il terminal carboxyl 
group and to the imidazole group, respec¬ 
tively. This rationalization culminated in the 
synthesis of nanomolar inhibitors, with com¬ 
pound (117) as a clear representative (216). 

Although workers at SmithKline Glaxo 
used the same conformation as starting point, 
they postulated other binding modes to the 
receptor. One of their alternative hypothesis 
considered compound (116) as a constrained 
analog in which the benzyl and the carboxyl 
groups corresponded to the Tyr side-chain and 
the C-terminal carboxyl group of A-ll. Follow¬ 
ing this hypothesis, modification of lead com¬ 
pound (116)eventually led to compound (118) 
(Fig. 15.48) with an IC 50 = 1.45 n M and oral 
activity of 30% (217). 

Site-directed mutagenesis studies on the 
ATI receptor revealed differences in the bind- 
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Figure 15.47. Non-peptidic Ras-farnesyltransferase inhibitors. 


ing site of angiotensin and the small molecule 
non-peptide compounds (119-120) (Fig. 15.49). 
There is no evidence that the single residues 
involved in inhibitor binding overlap with en¬ 
dogenous peptide binding. 

Some other non-peptide agonists have also 
appeared in the literature. Surprisingly, their 
binding mode differs from the binding mode of 
the peptide agonist (121), as well as that of the 
structurally similar non-peptide antagonist 
(122) (Fig. 15.49) (218).However, angiotensin 
and L-162,313 (122) require common critical 
residues for angiotensin ATI receptor activa¬ 
tion (219). 

9.6.2 Substance P. The tachykinin recep¬ 
tors (NK-1, NK-2, and NK-3) and their endog¬ 
enous ligands, the tachykinins, and neuroki¬ 
nins are important neurotransmitters (220- 


222). Antagonists of tachykinin receptors 
produce beneficial effects in several CNS dis¬ 
ease states such as pain, asthma, emesis, and 
depression. 

A general approach for converting a variety 
of peptide structures into small, type II pep- 
tidomimetic antagonists was devised by Hor- 
well and colleagues and is illustrated here for 
antagonists to Substance P. An alanine scan of 
the parent undecapeptide revealed that the 
Phe 4 -Phe s sequence was required for binding. 
Replacement of one these residues by Trp, fol¬ 
lowed by introduction of conformational con¬ 
straints by a-alkylation, provided the sub¬ 
nanomolar inhibitor (123) (Fig. 15.50) (223). 
Improved brain penetration was achieved by 
amine (124) (224). 

Chemical screening of corporate compound 
libraries resulted in the discovery of another 
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Figure 15.48. Angiotensin II inhibitors derived from a HTS lead. 


type of non-peptidic NK-1 antagonist, CP- 
96,345 (125) (Kj = 0.66 n M t Fig. 15.51). This 
compound heralded a breakthrough in the de¬ 
sign of these potential drugs (225, 226). Re¬ 
placement of the basic quinuclidine ring with 
a morpholine core improves duration of action 
and insertion of an amino triazole unit confers 
excellent solubility and CNS penetration 
(126) (Ki = 0.19 nM, Fig. 15.51) (227). 

Dual NK-l/NK-2 inhibitors, e.g., (127) 
(Fig. 15.52), have recently been designed by 
determining the important sites for maintain¬ 
ing NK-2 selectivity of the lead compound SR- 
48968 and introducing NK-1 pharmacophore 
groups (228-230). 

Fewer NK-3 selective receptor antagonists 
have been described, but a quinoline scaffold 
previously reported to be a selective NK-3 re¬ 
ceptor antagonist, has been converted to a 
dual NK-2/NK-3 inhibitor (128) (K t = 0.8 nAf 
NK-2 and 0.8 n M NK-3). The lead optimiza¬ 
tion was carried out by docking potential 
structures into a novel receptor model. The 
theoretical model compares closely with the 
recently published crystal structure of rho- 
dopsin (231). 


9.6.3 Neuropeptide Y. Neuropeptide Y 
(NPY) is a 36 amino acid polypeptide that is 
involved in hormonal, sexual, and cardiac ef¬ 
fects (232, 233). In 1994, two "first genera¬ 
tion" type I NY-1 selective antagonists, BIBP 
3226 (129) (234) and SR120107A (130)(235) 
were reported (Fig. 15.53). BIBP 3236 (129) 
corresponds to a truncated and modified pep¬ 
tide in which the D-Arg is assumed to corre¬ 
spond with Arg 33 in Neuropeptide Y. 

More recently, a series of indole Y1 antag¬ 
onists discovered by screening (236) led to the 
benzimidazole (131)0^ = 0.052 nAf). In this 
type of compound, the diamino moieties are 
postulated to mimic the two C-terminal argi¬ 
nines of NPY (237). 

Selective NPY-Y5 inhibitors have been shown 
to inhibit food intake activity in vivo. Most inhibi¬ 
tors found by HTS and lead optimization gave 
nanomolar and selective antagonists. It is not 
known whether these are functional or topological 
mimetics (Fig. 15.54) (238-240). 

9.6.4 Growth Hormone Secretagogues. 

Growth hormone (GH) releasing peptide mi¬ 
metics have become attractive alternatives to 
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L158809 

( 121 ) 



( 122 ) 




Figure 15.49. Examples of angiotensin II inhibitors. 


GH replacement therapy (241). The peptidyl 
GH secretagogue GHRP-6 (242) was used to 
develop the clinical candidate MK-0677 (132) 
(Fig. 15.55, EC„ = 1.3 n M) (243, 244). After 


Arg-Pro-Lys-Pro-Gln-Gln-Phe-Phe-Gly-Leu-Met-NH2 

Substance P 



R = CH 3 , (123) 

R = CH 2 N(CH 3 )2, (124) 


Figure 15.50. Development of a substance P 
inhibitor. 


identifying the important residues for bioac¬ 
tivity in GHPR-6, the Merck group began 
searching other receptor libraries for known 
"privileged structures" in a combinatorial 
synthetic fashion (see Section 4) (66). The 
more active derivative contained a spiropiperi- 
dine moiety attached to an indoline ring. 

More recently, ghrelin has been isolated 
and identified as an endogenous ligand of the 
GHS receptor and some new peptidomimetic 
structures [e.g., 133 (Fig. 15.55)] have started 
to appear (245). 

In another approach, SAR studies and sys¬ 
tematic simplification of GHPR-6 at Novo 
Nordisk produced the orally bioavailable de¬ 
rivative NN-703 (134). Molecular modeling 
overlapping of NN703 (134) (Fig. 15.56) and 
MK-0677 (132) (Fig. 15.55) showed structural 
similarities between both compounds. Highly 
potent hybrids of Ipamorelin and NN-703 
(e.g.» 135) (Fig. 15.56) have also been de¬ 
scribed (246, 247). 
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CP-96345 

(125) 


CF 3 



NMe 2 .HCi 

(126) 


Figure 15.51. Examples of NK-1 antagonists. 


A common 3D pharmacophore was recently 
described for peptidic and non-peptidic GH 
secretagogues by means of computational 
chemistry. After QSAR analysis, four pharma- 
cophoric sites were found: two aromatic rings, 
a proton acceptor, and a protonated amine. 
Using these strategies, some nanomolar an¬ 
tagonists [e.g., 136 (Fig. 15.56)] were discov¬ 
ered (248). 

9.6.5 Endothelin. The first report of endo- 
thelin in 1988 stimulated a huge effort to de¬ 
velop selective and non-selective endothelin 
receptor (ET A and ET B ) antagonists (249, 
250). One successful approach derived from 
the postulate that the phenyl groups of the 
screening lead might mimic two of the aro¬ 
matic side-chains (Tyr 13 , Phe 14 , or Trp 21 ) of 



(127) 


CF 3 



Figure 15.52. Dual NK-l/NK-2 and NK-l/NK-2 

inhibitors. 


ET-1 (251,252). Knowing that the carboxylic 
acid was also necessary for good activity, re¬ 
searchers at SmithKline overlaid their inhibi¬ 
tor with the aromatic groups Tyr 13 , Phe 14 , and 
Asp 18 in ET-1. After using a conformationally 
constrained analog of ET-1 to further define 
their NMR-derived structure of ET-1, the final 
overlay suggested that a carboxylic acid at¬ 
tached with a linker of two to three atoms on 
the 2-position of the phenyl ring would pro¬ 
vide further binding interaction by mimicking 
the C-terminal carboxylic acid. This led to 
compound ( 137 ) (Fig. 15.571, a potent antago¬ 
nist of both the ET A and ET B receptors with 
K x = 0.43 and 15.7 nM, respectively. Analogs 
based on a pyrrolidine scaffold are also effec¬ 
tive (e.g., 138) (Fig. 15.57) (253). 

The Kohonen neural network has been 
used to develop bioisosteres of the methylen- 
dioxyphenyl group found in a variety of antag- 
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Tyr-Pro-Ser-Lys-Pro-Asp-Asn-Pro-Gly-Glu-Asp-Ala-Pro-Ala-Glu-Asp-Leu-Ala 

Arg-Tyr-Tyr-Ser'Ala'Leu-Arg-His-Tyr-lle-Asn-Leu-lle-Thr-Arg-Gln-Arg-Tyr-NH 2 

NeuropeptideY 



BIBP3226 

(129) 



Figure 15.53. Examples of neuropeptide Y1 inhibitors. 


onists [e.g., 139 (Fig. 15.58)]. The benzothia- 
diazole (140) functions as a bioisostere that 
retains and sometimes improves binding to 
the ET a receptor (254-256). 


Since the discovery of Ro46-2005 ( 141 ) 
(Fig. 15.58), the first orally active ET inhibi¬ 
tor, major efforts have been made to modify 
arylsulfonamide derivatives. An isoxazole as 
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Figure 15.54. Examples of neuropeptide Y5 inhibitors. 


the heterocycle attached to the amino function¬ 
ality provided selectivity against ET A receptor 

(257) and led toBMS 193884 (142)^ = 1.4 n M) 

(258) and others, e.g., TBC 3214 (143)^ = 0.04 
n M) (259), which are potent, selective, and 
orally available ET A antagonists. 

Different binding modes have been pro¬ 
posed for ET antagonists. The acid or sulfon- 
amido groups are needed to interact with a 
cationic site in the receptor, and an aromatic 
interaction with Tyr 129 is postulated to be re¬ 
sponsible for ET a selectivity. However, be¬ 
cause all these receptors are members of the 
GPCR, there is no assurance that any bind as 


modeled. Thus, they must be classified as type 
II peptidomimetics until structural data can 
resolve the issue. 


10 SUMMARY AND FUTURE 
DIRECTIONS 

The "Holy Grail" of peptidomimetic research 
in drug discovery has been to find ways to 
transform the structural information con¬ 
tained in peptides into non-peptide structures 
that have drug-like pharmacodynamic proper¬ 
ties. Many different strategies have been 


His-D-T rp-Ala-T rp-D-Phe-Lys-NH 2 
GHRPB 


CO(CH 2 ) 6 -CH 3 

Gly-Ser-Ser-Phe-Leu-Ser-Pro-Glu-Hys-GIn-Arg-Val-GIn-GIn- 

-Arg-Lys-Glu-Ser-Lys-Lys-Phe-Phe-Ala-Lys-Leu-GIn-Phe-Arg 

Ghrelin 



MK-0677 

(132) (133) 


Figure 15.55. GHRP-6 and ghrelin non-peptide derivatives as growth hormone secretagogues 
inhibitors. 



NN-703 

(134) 


( 135 ) 



Figure 16.56. Newer approaches to growth hormone secretagogues inhibitors. 


Cys-Ser-Cys-Ser-Ser-Leu-Met 


/ \ 


Trp-lle-lle-Asp-Leu-His-Cys-Phe-Tyr-Val-Cys-Glu-Lys-Asp 

Endothelin-1 


OMe 




(137) (138) 


Figure 15.57. Non-peptide endothelin analogs. 
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TBC3214 

(143) 


Figure 15.58. Examples of ET A inhibitors. 

employed in the search for useful peptidomi- the progress made to date suggests that this 

metics—rational design of amide bond re- goal will be achieved. We know that some non¬ 
placements, mimics of turn structures, and peptide scaffolds are topographical mimetics 

the like, as well as both designed and discov- of the extended j8-strand of enzyme-bound 

ered scaffolds that replace the amide bond protease inhibitors because we have the bio¬ 
core of peptides. The field has a long way to go physical methods for characterizing both 

before rational design of type III peptidomi- types of enzyme-inhibitor complexes. Type III 

metics can be achieved routinely. However, peptidomimetic inhibitors of peptidases have 
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been designed from the substrate sequences 
and they have been revealed by HTS processes 
and optimized by application of structural bi¬ 
ology. At this point, we have learned more 
about the design of inhibitors by studying how 
screening leads inhibit enzymes than from the 
design of inhibitors from our current, limited 
knowledge of enzyme catalysis. Probably the 
most important recent discovery is that some 
screening leads inhibit proteases by binding to 
a different enzyme active site conformation 
that is related mechanistically to the well- 
characterized extended j3-strand of enzyme- 
bound protease inhibitors. This result empha¬ 
sizes the importance of considering the entire 
ensemble of protein conformations when de¬ 
signing inhibitors of peptide-protein interac¬ 
tions. 

Our understanding of peptide mimicry for 
ligands of constitutive receptors, such as G- 
protein-coupled receptors (GPCR), is much 
more primitive because high resolution struc¬ 
tural data for agonist- and/or antagonist-re¬ 
ceptor complexes are not yet available. For 
this reason, all attempts to rationalize the in¬ 
teractions between ligand and receptor con¬ 
tain a considerable element of speculation. It 
is too early to know whether small non-pep- 
tide structures that bind to GPCR are func¬ 
tional or topographical mimetics. However, 
based on the results obtained by studying pep¬ 
tidase inhibitors, it seem likely that at least 
some of the known functional peptidomimet- 
ics receptors ligands will be shown to be topo¬ 
graphical mimetics. Others may be found to 
act more like GRAB-peptidomimetics in that 
they bind to receptor conformations closely re¬ 
lated in energy and mechanism to native con¬ 
formations. Still others will no doubt be found 
that inhibit or stimulate the receptor system 
by allosteric mechanisms or by interfering 
with some multi-step binding process preced¬ 
ing the formation of the active ligand-receptor 
complex. In any case, it is clear that successful 
design of functional mimetics by assuming 
some structural relationship between a 
screening lead and the parent peptide can 
work (see Section 9.6), as can the systematic 
modification of the parent peptide. The appli¬ 
cation of the principles of peptidomimetic re¬ 
search has become very important to drug dis¬ 
covery. Although our present knowledge 


about protein-protein interactions is still 
quite limited, the rapid growth of structural 
information and methods will eventually al¬ 
low us to design rationally peptidomimetic 
compounds suitable for use in human therapy. 
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1 INTRODUCTION 

This chapter is limited to nonprotein thera¬ 
peutic candidates. The subject of peptide ana¬ 
logs and peptidomimetic agents merits sepa¬ 
rate consideration. Contemporary search for 
new drugs makes extensive use of robotic 
techniques of combinatorial chemistry and 
high throughput synthesis, whereby huge 
numbers of compounds can be prepared for 
high throughput screening. However, this 
nonselective synthetic method based on a ran¬ 
dom screening philosophy should not replace 
the strategy of analog design, but rather it 
should be considered as a useful prelude to 
analog design. 

In any strategy aimed at designing new 
drug molecules or analogs of known biologi¬ 
cally active compounds, there are no absolute 
guidelines or rules for procedure; the knowl¬ 
edge, imagination, and intuition of the medic¬ 
inal chemist are the most important contribu¬ 
tors to success. Analog design is as much an 
art as it is a science. The concept of analog 
design presupposes that a lead has been dis¬ 
covered; that is, a chemical compound has 
been identified that possesses some desirable 
pharmacological property. The search for and 
identification of leads is a challenge and is a 
separate topic. It is sufficient for the present 
discussion to note that lead compounds are 
frequently identified as endogenous partici¬ 
pants (hormones, neurotransmitters, second 
messengers, or enzyme cofactors) in the 
body's biochemistry and physiology, or a lead 
may result from routine, random biological 
screening of natural products or of synthetic 
molecules that were created for purposes 
other than for use as drugs. 

Analog design is most fruitful in the study 
of pharmacologically active molecules that are 
structurally specific: their biological activity 
depends on the nature and the details of their 
chemical structure (including stereochemis¬ 
try). Hence, a seemingly minor modification of 
the molecule may result in a profound change 
in the pharmacological response (increase, di¬ 
minish, completely destroy, or alter the nature 
of the response). In pursuing analog design 


and synthesis, it must be recognized that the 
newly created analogs are chemical entities 
different from the lead compound. It is not 
possible to retain all and exactly the same sol¬ 
ubility and solvent partition characteristics, 
chemical reactivity and stability, acid or base 
strength, and/or in vivo metabolism proper¬ 
ties of the lead compound. Thus, although the 
new analog may demonstrate pharmacological 
similarities to the lead compound, it is not 
likely to be identical to it, either chemically or 
biologically, nor will its similarities and differ¬ 
ences always be predictable. 

The goal of analog design is twofold: (l)to 
modify the chemical structure of the lead com¬ 
pound to retain or to reinforce the desirable 
pharmacologic effect while minimizing un¬ 
wanted pharmacological (e.g., toxicity, side ef¬ 
fects, or undesired routes of and/or unaccept¬ 
able rates of metabolism) and physical and 
chemical properties (e.g., poor solubility and 
solvent partition characteristics or chemical 
instability), which may result in a superior 
therapeutic agent; and (2) to use target ana¬ 
logs as pharmacological probes (i.e., tools used 
for the study of fundamental pharmacological 
and physiological phenomena) to gain better 
insight into the pharmacology of the lead mol¬ 
ecule and perhaps to reveal new knowledge of 
basic biology. Studies of analog structure-ac¬ 
tivity relationships may increase the medici¬ 
nal chemist's ability to predict optimum 
chemical structural parameters for a given 
pharmacological action. 

Analog design is greatly facilitated if the 
medicinal chemist can initially define the 
pharmacophore of the lead compound: that 
combination of atoms within the molecule 
that is responsible for eliciting the desired 
pharmacologic effect. Analog design may be 
directed toward maintaining this combination 
of atoms intact in a newly designed molecule 
or toward a carefully planned, systematic 
modification of the pharmacophore. If the me¬ 
dicinal chemist is uncertain about the struc¬ 
tural features of the pharmacophoric portion 
of the molecule, a prime initial goal of analog 
design should be to define the pharmacophore. 
The medicinal chemist should address the fol- 
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lowing questions: What change(s) can be made 
in the lead molecule that permit(s) retention 
or reinforcement of pharmacological action? 
and What change(s) can be made in the mole¬ 
cule that diminish, destroy, or qualitatively 
change the basic pharmacologic action? The 
ideal program of analog design should involve 
asingle structural change in the lead molecule 
with each new compound designed and syn¬ 
thesized. An analog in which multiple changes 
in the structure of the lead molecule have been 
made simultaneously may occasionally reveal 
highly desirable pharmacologic effects. How¬ 
ever, relatively little useful structure-activity 
information will be gained from such a mole¬ 
cule. It cannot be readily determined which 
change (or combination of changes) was re¬ 
sponsible for the change in the pharmacologi¬ 
cal effect. On a practical basis, it is frequently 
chemically impossible to effect only one dis¬ 
crete change in the lead molecule; one simple 
molecular structural alteration can influence 
many structural and chemical parameters. 
Nonetheless, the medicinal chemist should be 
cognizant of the disadvantages inherent in 
"shotgun" (nonsystematic, multiparametric) 
modification of lead molecules. 

In analog design, molecular modification of 
the lead compound can involve one or more of 
the following strategies: 

1. Bioisosteric replacement. 

2. Design of rigid analogs. 

3. Homologation of alkyl chain(s) or alter¬ 
ation of chain branching, design of aro¬ 
matic ring-position isomers, alteration of 
ring size, and substitution of an aromatic 
ring for a saturated one, or the converse. 

4. Alteration of stereochemistry, or design of 
geometric isomers or stereoisomers. 

5. Design of fragments of the lead molecule 
that contain the pharmacophoric group 
(bond disconnection). 

6. Alteration of interatomic distances within 
the pharmacophoric group or in other parts 
of the molecule. 

None of these strategies is inherently pref¬ 
erable to the others; all merit the medicinal 
chemist's attention and consideration. Appli¬ 


cation of a combination of these strategies to 
the lead molecule may be advantageous. Con¬ 
sidering the possible permutations and combi¬ 
nations of these changes that are possible 
within a single lead molecule, it is obvious that 
the number of analogs that can be designed 
from a lead molecule is potentially extremely 
large. Some structural changes that might be 
proposed are chemically impracticable (e.g., 
the molecule is incapable of existence) or the 
proposed analog may represent an over¬ 
whelmingly formidable synthetic challenge. 
These negative factors will diminish the pop¬ 
ulation of possible analogs to be considered for 
synthesis; nevertheless, the medicinal chemist 
will always be confronted with a multitude of 
possible target molecules. Rational decisions 
must be made concerning which compounds 
should be synthesized, and synthetic priorities 
must be established for target compounds. All 
other factors being equal, the medicinal chem¬ 
ist should synthesize the less-challenging 
compounds first. Beyond this truism, the me¬ 
dicinal chemist's best resources are intuition 
and imagination. Selection and application of 
specific molecular modification strategies de¬ 
pend on the chemical structure of the lead 
compound and, to a certain extent, on the 
pharmacological action to be studied. 

All of the strategies of analog design as well 
as subsequent decisions concerning target 
compounds to be synthesized can be facili¬ 
tated by the use of computational chemistry 
(computer-assisted molecular modeling) tech¬ 
niques. These may give the medicinal chemist 
further insights into structural, stereochemi¬ 
cal, and electronic implications of the pro¬ 
posed molecular modification. 

2 BIOISOSTERIC REPLACEMENT AND 
NONISOSTERIC BIOANALOGS 
(NONCLASSICAL BIOISOSTERES) 

The concept of bioisosterism derives from 
Langmuir's (1 Observation that certain phys¬ 
ical properties of chemically different sub¬ 
stances (e.g., carbon monoxide and nitrogen, 
ketene and diazomethane) are strikingly sim¬ 
ilar. These similarities were rationalized on 
the basis that carbon monoxide and nitrogen 
both have 14 orbital electrons and, similarly, 
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Table 16.1 Bioisosteric Atoms and Groups 

1. Univalent 


-F -OH 

-nh 2 

-CH 3 -Cl 


S H 

-ph 2 


-I 

f-C 4 H 9 


-Br 

i-C 3 H 7 

2. Bivalent 



-0- -S- 

-Se- 

-ch 2 - -nh 

3. Tervalent 




-N= -CH= 

-P= -As= 

4. Quadrivalent 
-C- -Si- 

5. Ring equivalents 

-CH=CH- -S-(e.g., benzene, thiophene) 
=CH- =N~(e.g., benzene, pyridine) 
-0- -S- -CH 2 - -NH- 


diazomethane and ketene both have 22 orbital 
electrons. Medicinal chemists have expanded 
and adapted the original concept to the analy¬ 
sis of biological activity. The following defini¬ 
tion has been proposed: "Bioisosteres are 
groups or molecules which have chemical and 
physical properties producing broadly simi¬ 
lar biological properties" (2). This definition 
might be modified to include the concept that 
bioisosteres may produce opposite biological 
effects, and these effects are frequently a re¬ 
flection of some action on the same biological 
process or at the same receptor site. Bioisos¬ 
teric similarity of molecules is commonly as¬ 
signed on the basis of the number of valence 
electrons of an atom or a group of atoms rather 
than on the total number of orbital electrons, 
as was originally specified by Langmuir. In a 
remarkable number of instances, compounds 
result that have similar (or even diametrically 
opposite) pharmacological effects compared 
with those of the parent compound. Catego¬ 
ries of classic bioisosteres have been described 
(2) (Table 16.1). 

A more recent comprehensive review of 
bioisosterism appeared in 1996 (3).In a short 
communication, Burger (4) discussed and pro¬ 
vided valuable insights into isosterism and 
bioanalogy in drug design. 

Many compounds have been identified that 
comply with the "biology" aspect of the bio- 
isostere concept but that do not fit the strict 
chemical (steric and electronic) definition of 


bioisosteres. Floersheim et al. (5) proposed 
that such compounds be designated as 
nonisosteric bioanalogs, replacing the older 
term, "nonclassical bioisosteres." However, 
most of the contemporary literature retains 
the nonclassical bioisostere terminology. Ta¬ 
ble 16.2 lists representative nonclassical bio¬ 
isosteres. 

Dihydromuscimol (1) and thiomuscimol (2) 

are cyclic analogs of y-aminobutyric acid 
(GABA) (3), in which the C=N moiety of the 


,0H 


HoN. 






O' 


.N 


( 1 ) 



( 2 ) 


OH 

h 2 n 

N —' o 

(3) 

heterocyclic ring is considered to be bioisos¬ 
teric with the of GABA. The -S- moiety 
of thiomuscimol is bioisosteric with the ring 
-0- of dihydromuscimol. Both (l)and (2)are 
highly potent agonists at GABA a receptors, as 
determined in an electrophoresis-based assay 
( 6 ). 

Because of its bioisosteric similarity to the 
normal physiological substrate L-dopa (4), 
L-mimosine (5) inhibits catechol oxidation by 
the enzyme tyrosinase (7). These compounds 
exemplify a situation in which bioisosteres dis¬ 
play opposite pharmacologic effects at the 
same receptor. 

The sulfonium bioisostere (6) of N,N-di- 
methyldopamine (7) retains the dopaminergic 
agonist effect displayed by (7) (8). The fact 
that (6) bears a permanent unit positive 
charge was invoked in support of the hypoth- 
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OH 

OH 


ch 2 

H 2 N— C“H 
COOH 
(4) 



use of the antidepressant dibenzazepine deriva¬ 
tive imipramine (8) as the lead. The structural 
similarity between imipramine and the phe- 
nothiazine antipsychotics [typified by chlor- 
promazine (9)] is apparent. Although these two 




CH 2 


H 2 N—C—H 
COOH 
(5) 




bioisosteric molecules have different pharma¬ 
cological properties and therapeutic uses and 
likely have different mechanisms and sites of 
action in the central nervous system (9), they 
share the property of being psychotropic' 
agents. They illustrate the observation that 
bioisosteric manipulation of a molecule may 
change its mode of action. In the antidepres¬ 
sant dibenzocycloheptene derivative amitrip¬ 
tyline ( 10 ), the ring nitrogen of imipramine is 


OH 



n ch 3 

(7) 


esis that /3-phenethylamines such as ^inter¬ 
act with dopamine receptors in their proton- 
ated (cationic) form. 

Bioisosteric replacement strategy has been 
fruitful in design of psychoactive agents, by 



( 10 ) 

replaced by an exocyclic olefin moiety. De- 
mexiptiline ( 11 ), doxepin ( 12 ), and dothiepin 
(13) represent other bioisosteric modifications 
of imipramine that possess antidepressant ac- 
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ch 2 —ch 2 —n 

ch 3 

( 12 ) 



(13) 


uncertain. Apparently, attempts were made 
(13) to isolate the E- and Z-isomers of all of the 
compounds prepared in the series studied, but 
no information was provided about the stereo¬ 
chemistry of the dothiepin material used in 
the pharmacological studies. 

Replacement of the entire indole ring sys¬ 
tem of melatonin (14) by a naphthalene ring 

O 

I! 

CHsO^^ .CH 2 -CH 2 -N-C-CH 3 
r i)-iT i 



I 

H 


(14) 


0 

II 



(15)permitted retention of binding affinity in 
an ovine pars tuberalis membrane assay (14). 

From a study (15) of a series of muscarinic 
M t agonists derived from the structure of 
arecoline (16) and typified by (17), it was con- 

O 


tivity (10). Variations in the precise nature of 
psychotropic effects manifested by compounds 
(8-13)may be ascribed to the marked changes 
in orientation in space of the two benzene ring 
components of the tricyclic portion of these 
molecules, imposed on them by the isoste- 
ric moieties (-CH=CH- -CH 2 —CH 2 -, -S-, 
-CH 2 0-, CH 2 S-). The Z-isomer of oxepin is a 
more potent antidepressant than the E-iso- 
mer, but the drug is marketed as a mixture of 
isomers (1 l)Doxepin is also a potent antago¬ 
nist at histamine H t receptors. The Z-isomer 
is somewhat more potent than the E-isomer 
against histamine in the guinea pig ileum (12). 
The identity of the geometric isomer of dothi¬ 
epin (13) used in pharmacological testing is 



II 

c—o— ch 3 


ch 3 



(16) 

N—OCH 3 
CN 
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eluded that the Z-AT-methoxyimidoyl nitrile 

group serves as a stable methyl ester bioisos- 
tere. TheZ-isomer has an 18-fold higher affin¬ 
ity than its E-isomer for the rat cerebral cortex 
tissue used in the binding studies. 

Replacement of the methyl ester moiety of 
the muscarinic partial agonist arecoline (16) 
by the putative nonclassical bioisosteric 1,2,4- 
oxadiazole ring system (18), where R = un- 


R 



I 

CH 3 

( 18 ) 


branched alkyl) permits retention of 

muscarinic agonism (16). 

The 1,2,4-oxadiazole ring system of quis- 
qualic acid (19), an agonist at a subpopulation 



CH 2 

COOH 

A 


nh 2 

(19) 

of glutamate receptors (17), can be considered 
to be a nonclassical bioisostere of the corre¬ 
sponding carboxyl group of glutamic acid (20). 

Compounds (21-23) illustrate further ex¬ 
amples of nonclassical bioisosteres. Com¬ 
pound (21) was reported to display anti¬ 
trypanosomal activity (18). The analogs (22) 
and (23) also displayed antitrypanosomal ac¬ 
tivity (19). Compound (22) demonstrated the 
most impressive activity (IC 50 values of 40 and 


COOH 

ch 2 

ch 2 

H ""'C—COOH 

i 

nh 2 

( 20 ) 

165 n M) with respect to potency and effect on 
two arsenic-resistant strains of the organism. 

Although the strategy of bioisosteric re¬ 
placement may be a powerful and highly pro¬ 
ductive tool in analog design, Thornber (2) has 
emphasized that fundamental chemical and 
physical chemical changes can be expected to 
result from these molecular modifications, 
which may in themselves profoundly affect the 
pharmacological action of the resulting mole¬ 
cules. Contributing factors include change in 
the size of the atom or group introduced, 
which may affect the overall shape and size cf 
the molecule; changes in bond angles; change 
in partition coefficient; change in the p K a of 
the molecule; alteration of chemical reactivity 
and chemical stability of the molecule, with 
accompanying qualitative and quantitative al¬ 
teration of in vivo metabolism of the molecule; 
and change in hydrogen-bonding capacity. 
The chemical and biological results and phar¬ 
macological significance of many of these fac¬ 
tors are unpredictable and must be deter¬ 
mined experimentally. 

3 RIGID OR SEMIRIGID 

(CON FORMATION ALLY RESTRICTED) 

ANALOGS 

Imposition of some degree of molecular rigid¬ 
ity on a flexible organic molecule (e.g., by in¬ 
corporation of elements of the flexible mole¬ 
cule into a rigid ring system or by introduction 
of a carbon-carbon double or triple bond) may 
result in potent, biologically active agents that 
show a higher degree of specificity of pharma¬ 
cologic effect. There are possible advantages 
to this technique (20): the key functional 
groups are held in one steric disposition or, in 
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R. 


H' 




N 



N 

I 

H 





H 


(21) R = R' = CH 2 —C 6 H 5 




the case of a semirigid structure, the key func¬ 
tional groups are constrained to a limited 
range of steric dispositions and interatomic 
distances. By the rigid analog strategy, it is 
possible to approximate "frozen" conforma¬ 
tions of a flexible lead molecule that, if an en¬ 
hanced pharmacological effect results, may as¬ 
sist in defining and understanding structure- 
activity parameters, including the three- 
dimensional geometry of the pharmacophore. 
These data may be useful in constructing a 
model of the topography of the receptor. Com¬ 
putational chemistry strategies can be a valu¬ 
able tool in designing rigid analogs. 

The semirigid tetralin congeners (24) and 
(25) of N 2V-dimethyldopamine (7) represent 



CH 3 

(24) 


the two rotameric conformational extremes of 
the spatial relationship of the aromatic ring of 
dopamine to the ethylamine side chain when 
the ring and the side chain are coplanar. Com¬ 
pounds (24) and (25) display effects at differ¬ 
ent subpopulations of dopamine receptors 
(21), which have been proposed to reflect dif¬ 



ferent conformations assumed by the flexible 
dopamine molecule at its various in vivo sites 
of action. 

Restriction of conformational freedom of 
the acyl moiety in 4-DAMP (26), an antimus- 


O 



carinic compound displaying higher affinity at 
ileal M 3 acetylcholine receptors than at atrial 
M 2 receptors) was imposed by the structure of 
the spiro- compound (27) (22). 
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O 



Spiro-DAMP (27) was slightly more potent 
at M 2 muscarinic receptors than at M 3 recep¬ 
tors. It was proposed that the geometry of the 
spiro-molecule might reflect the receptor- 
bound conformation of 4-DAMP (26);this con¬ 
formation differs from that observed in the 
crystal structure of 4-DAMP. 

Conformational restriction was introduced 
into the side chain of a nonclassical serotonin 
bioisostere (28), a selective 5-HT„ and 
5-HT„ receptor agonist) by its incorporation 
into a fused six-membered ring (29) (23). 


0 



H 

(29) 


pound (29) was described as a partial agonist. 
It was concluded that the conformation of the 
indole-3-ethylamine portion of the fused sys¬ 
tem (29) reflects the conformation assumed by 
the flexible system (28) when it binds to the 
5-HT„ receptor. 

Imposition of rigidity into the piperidine 
ring of the opioid analgesic meperidine (30)by 
introduction of a methylene bridge between 
carbons 2 and 5 resulted in epimers (31)and 
(32), representing "frozen" conformations cf 
meperidine (24). 




0 

II 



The conformational restrictions imposed 
on the indole-3-ethylamine moiety permitted 
retention of affinity for the 5-HT„ receptor 
but it diminished affinity for the 5-HT„ re¬ 
ceptor by a factor of 1000. In two functional 
assays, (29) exhibited potency equal to or mar¬ 
ginally greater than that of serotonin. Com- 


The exo-phenyl isomer (32) was six times as 
potent as the erafo-phenyl isomer (31), and it 
was twice as potent as meperidine itself in a 
benzoquinone-induced writhing assay for an¬ 
algesic effect. 

Rigid analogs (33), (34), and (35) of phen¬ 
cyclidine (36) possess a rigid carbocyclic struc- 
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H 

(34) 




(36) 


ture and an attached piperidine ring that is 
free to rotate. All three rigid analogs showed 
low to no affinity for the PCP receptor, but 
they had good affinity in a a-receptor-binding 
assay (25).These binding data were proposed 
to be useful in defining a model for the cr-re- 
ceptor pharmacophore. This study also pro¬ 
vided additional evidence that the a-receptor 
is independent of the PCP-binding site (cf. 
Ref. 26 and references therein). 


Incorporation of the choline portion cf ace¬ 
tylcholine (37) into a cyclopropane ring sys¬ 
tem resulted in cis- and drafts-1,2-disubsti- 
tuted molecules, (38)and (39), in which the 

O 

II 

CH 3 —C—O—CH 2 —CH 2 —N(CH 3 ) 3 
(37) 


0 


(CH 3 ) 3 N o—c—ch 


H v H 
(38) 


(CH 3 ) 3 N h 


M 


H v O—C—CH 3 
II 
n 


(39) 

acetylcholine molecule is locked into folded 
("cisoid") and extended ("transoid") confor¬ 
mations. 

The (LS0,(2S)-(+)-£rarcs-isomer (39) was 
somewhat more potent than acetylcholine it¬ 
self in tissue and whole-animal assays for 
muscarinic agonism (27) and it was an excel¬ 
lent substrate for acetylcholinesterase. The 
(1R),(2R) enantiomer of (39) was exponen¬ 
tially less potent than its (1 jS),(2jS) enantiomer 
in the assays cited, but it was a good substrate 
for acetylcholinesterase. The (2)-cis-isomer 
(38) was almost inert at nicotinic and musca¬ 
rinic receptors and it was a poor substrate for 
acetylcholinesterase. These data were taken 
as evidence that the flexible acetylcholine mol¬ 
ecule interacts with muscarinic receptors in 
an extended geometry of the chain of atoms 
(28). When this semirigid analog strategy was 
applied to a cyclobutane ring system (com¬ 
pound 40), there was a marked loss of phar¬ 
macologic effect (29). This result is enigmatic; 
differences in interatomic distances and bond 
angles in the pharmacophoric moiety as well 
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as differences in extraneous molecular bulk 
seem insufficient to account for the dramatic 
difference in pharmacological potencies be¬ 
tween the three- and the four-membered ring 
systems. 

The cyclopropane ring was employed to im¬ 
part a degree of rigidity to the side chain of 
dopamine (structures 41 and 42) (30). 


(41) 




Neither isomer displayed effects at dopa¬ 
mine receptors, but both were a-adrenoceptor 
agonists, with the (±)-£ran.s-isomer (41) being 
approximately five times more potent than the 
(±)-cis-isomer (42). It was suggested (31)that 
these findings may contribute to determining 
the preferred conformation of /3-phenethyl- 
amines at the a-adrenoceptor. The racemic 
trans-cyclobutane congeners (43a) and (43b) 
are more potent than their racemic cis-iso- 
mers (44a) and (44b) in binding studies on rat 


(43a) R = R' = CH 3 
(43b) R = R' = H 



(44a) R = R' = CH 3 
(44b) R = R' = H 

corpus striatum tissue, but the binding affini¬ 
ties for (43a) and (43b) are much less than 
that of dopamine (32). Racemic trans-(4Sa) 
was more potent than the trans-primary 
amine (43b), but it was still much less potent, 
than dopamine. The racemic cis-isomer of 
(44b) demonstrated very low affinity for the 
receptor. 

A /3-phenethylamine moiety was incorpo¬ 
rated into the trans-decalin ring system (45) 


OH 



(45) 

and the racemic modifications of all four pos¬ 
sible isomers were prepared as "frozen" ana¬ 
logs of possibly significant conformations cf 
the flexible norepinephrine molecule (33). All 
four compounds displayed approximately 
equal (extremely low) potency. This result il- 
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lustrates that the achievement of conforma¬ 
tional integrity by incorporation of a flexible 
pharmacophore into a bulky, complex mole¬ 
cule may be at the expense of biological activ¬ 
ity. 

Rigidity was introduced into the glutamic 
add moiety in a series of bioisosteric conge¬ 
ners (46-48) (34). These systems showed po¬ 


ll 



(48) X = S 

tent agonist activity at subpopulations of 
metabotropic glutamate receptors. The geom¬ 
etry of these congeners led to the conclusion 
that glutamic acid itself interacts with the 
metabotropic glutamate receptors in a fully 
extended conformation. 

The rotational orientation of the ester moi¬ 
eties of the myoneural blocking agent succi- 
nyldicholine (49) was restricted by introduc¬ 
tion of a double bond into the succinic acid 
portion (50), (51) (35). The E-fumarate ester 

0 

II 

H 2 C—C—0—CH 2 —CH 2 —N(CH 3 ) 3 

h 2 c— c —o— ch 2 —CH 2 —N(CH 3 ) 3 
II + 

o 

(49) 

o 

II 

HC—C—O—CH 2 —CH 2 —N(CH 3 ) 3 

HC—C—0—CH 2 —CH 2 —N(CH 3 ) 3 
II + 

o 

(50) 

(51) was approximately one-half as potent as 
the flexible succinate ester (49), whereas the 
Z-maleate ester (50) was 1/40 as potent as the 


succinate. These results led to the conclusion 
that the molecular shape of the E-ester (51) 
more closely approximates that assumed by 
succinylcholine when it interacts with myo¬ 
neural nicotinic receptors. 

Restricted rotation was also introduced 
into the succinic acid moiety of succinyldicho- 
line by preparation of the choline esters of cis- 
and /mras-cyclopropane-l,2-dicarboxylic acids 
(52) and (53)(36, 37). Myoneural blocking ac¬ 
tivity was assessed in dogs (37) and-cats (36) 
and, as indicated above for the E- and Z-ole- 
finic esters (51) and (50), the extended trans- 
isomer (53) demonstrated much greater po¬ 
tency and a longer duration of action than 
those of the cis-isomer (52). The cyclobutane 
congeners (54) and (55) presented unexpected 
results that are difficult to rationalize: the cis- 
isomer (54) was much less potent than the 
trans-isomer (55)in a cat assay for myoneural 
blockade, but it presented a decidedly longer 
duration of action than that of the irans-iso- 
mer (36). 


4 HOMOLOGATION OF ALKYL CHAIN 
OR ALTERATION OF CHAIN BRANCHING; 
CHANGES IN RING SIZE; RING-POSITION 
ISOMERS; AND SUBSTITUTION OF AN 
AROMATIC RING FOR A SATURATED 
ONE, OR THE CONVERSE 

Change in size or branching of an alkyl chain 
on a bioactive molecule may have profound 
(and sometimes unpredictable) effects on 
physical and pharmacological properties. Al¬ 
teration of the size and/or shape of an alkyl 
substituent can affect the conformational 
preference of a flexible molecule and may alter 
the spatial relationships of the components of 
the pharmacophore, which may be reflected in 
the ability of the molecule to achieve comple¬ 
mentarity with its receptor or with the cata¬ 
lytic surface of a metabolizing enzyme. The 
alkyl group itself may represent a binding site 
with the receptor (through hydrophobic inter¬ 
actions), and alteration of the chain may alter 
its binding capacity. Position isomers of sub¬ 
stituents (even alkyl groups) on an aromatic 
ring may possess different pharmacological 
properties. In addition to their ability to affect 
electron distribution over an aromatic ring sys- 
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0 

J] 

HC—C—0—CH 2 —CH 2 —N(CH 3 ) 3 
(CH 3 ) 3 N—CH 2 —CH 2 — 0~c— CH 

O 


(51) 


tem, position isomers may differ in their comple¬ 
mentarity to receptors, and the position of a sub¬ 
stituent on a ring may influence the spatial 
occupancy of the ring system with respect to the 


remainder of a conformationaUy variable mole¬ 
cule. What has sometimes been trivialized as 
"methyl group roulette” may indeed be an im¬ 
portant parameter in the design of analogs. 


K 


(CH 3 ) 3 N—CH 2 —CH 2 —o—c C—0—CH 2 —CH 2 -N(CH 3 ) 3 

0 o 

(52) 


o 


w 


+ 


H a C—O—CH 2 —CH 2 —N(CH 3 ) 3 


o 


(CH 3 ) 3 N— ch 2 -ch 2 - 


0—C H 

(I 

o 

(53) 


H . H 



(CH 3 ) 3 N—CH 2 —CH 2 —O—C C—0—CH 2 —CH 2 —N(CH 3 ) 3 

o o 


(54) 


0 


H c— O—CH 2 —CH 2 — N(CH 3 ) 3 



(CH 3 ) 3 N—CH 2 -CH 2 —0—C H 

0 


( 55 ) 
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Homologation of the N-alkyl chain in 
norapomorphine (56)from methyl (57) to n- 
propyl (59) produced incremental increases in 
emetic response in dogs and in stereotypy re¬ 
sponses in rodents (38, 39). 



R 

(56) R = H 

(57) R = CH 3 

(58) R = C 2 H 5 

(59) R = n-C 3 H 7 

(60) R = n-C 4 H 9 

The next member of the series,the n-butyl 
homolog (60), demonstrated a tremendous 
loss in potency and activity compared to that 
cf the lower homologs (39). Studies of N^N- 
dialkyl dopamines (61 -64)revealed that some 


OH 



rats. It seems likely that the enhanced dopa¬ 
minergic agonist effects conferred by N-ethyl 
and N-n-propyl groups on aporphine and 
/B-phenethylamine-derived molecules are not 
related merely to enhanced lipophilic charac¬ 
ter or to partitioning phenomena, but rather 
to the likelihood that the two- and three-car¬ 
bon chains have a positive affinity for subsites 
on certain dopamine receptors. It may be spec¬ 
ulated that these receptor subsites do not ac¬ 
commodate longer alkyl chains (e.g., n-butyl 
or n-pentyl). However, different assays for do¬ 
paminergic stimulant effects and different an¬ 
imal species were used in refs. 41, 42, and 43, 
and care must be exercised in drawing firm 
structure-activity relationship conclusions 
based on these data. 

The alkyl linker between the two heterocy¬ 
clic ring systems in structure (65) was modi- 


(61) R = R' = CH 3 

(62) R = R' = n-C 3 H 7 

(63) R = n-C 3 H 7 ; R' = rc-C 4 H 9 

(64) R = R' = n-C 4 H 9 

combinations of alkyl groups may impart a 
high degree of dopamine agonist effects (40). 

NJV-dimethyldopamine (61) is extremely 
potent in assays for dopaminergicagonism (pi¬ 
geon pecking, emesis in dogs, and inhibition of 
cat cardioaccelerator nerve), as is N^N-di-n- 
propyldopamine (62) (41). N-n-Propyl-N-n- 
butyldopamine (63) is potent in behavioral as¬ 
says in nigra-lesioned rats (42). However, 
NJV-di-n-butyldopamine(64) is virtually inert 
in these assays (41, 42). IV,IV-di-tt-Pentyldo- 
pamine was reported (43) to be inert in a 
caudectomized mouse behavioral assay and in 
a rotatory behavioral assay in nigra-lesioned 


(66) linker = Y = H 

(67) linker = Y = H 

CH 

(68) linker = Y = H 

(69) linker = Y = H 

CH 3 

lied in studies of the ability of analogs to bind 
to the cholecystokinin-B receptor (44). When 
this linking group was -CH 2 —CH 2 -, the com¬ 
pound (structure 66) was extremely potent in 
radioligand displacement assays on mouse 
brain membranes. Introduction of carbon-car- 
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bon unsaturation (S-olefin) into the linker 
(structure 67) resulted in a 16-fold decrease in 
binding ability; this suggests that conforma¬ 
tional restriction and limitation of molecular 
flexibility have deleterious effects on biologi¬ 
cal activity. However, no data were reported 
on the Z-isomer of this olefinic molecule, so 
that caution should be exercised in drawing 
conclusions. Introduction of a bromine sub¬ 
stituent (65, Y = Br) into (66) produced a 
threefold increase in potency, whereas the 
same structural modification of the olefin (67) 
resulted in a threefold decrease in potency. 

Branching the linker chain with a methyl 
group adjacent to the quinazolinone ring (68) 
resulted in a 350-fold decrease in affinity. 
However, chain branching with a methyl 
group in the alternate position on the ethylene 
chain produced compound (69), whose recep¬ 
tor affinity was of the same order of magnitude 
as the extremely potent lead compound (66). 
The exponential difference in receptor-bind¬ 
ing ability exhibited by the two isomeric 
branched-chain linker compounds (68) and 

(69) was ascribed to unfavorable steric inter¬ 
actions between the receptor and the linker 
methyl group of (68) (44). This conclusion may 
be compromised by the fact that both (68) and 
(69) were evaluated as their racemates. 

A study (45) of 2-(phosphonomethoxy)eth- 
ylguanidines (70-73) as antiviral (herpesand 


ished toxicity 16-fold compared to that of the 
nonmethylated system (70). 

In contrast, (R)-(71), the 2'-methyl conge¬ 
ner, exhibited only a fivefold decrease in anti¬ 
viral potency compared to that of compound 

(70) , but it also exhibited a 30-fold lessening cf 
toxicity, to produce a substantial increase in 
therapeutic index over that of (70). The (S)- 

(71) enantiomer was somewhat less potent 
than its (R)-enantiomer. The gem-dimethyl 
congener (73) was also somewhat less potent 
than the (R)-2'-monomethyl compound (71) 
and it was markedly more toxic. The GS)-2'- 
methyl stereoisomer of (71) exhibited a decid¬ 
edly lower therapeutic index than that of its 
(R)-enantiomer. 

Closely related to alteration of chain 
length and/or chain branching is alteration 
of ring size. Compound (74) showed nano- 



(74) n = 1; m = 2 

(75) n = m = 1 

(76) n = 1; m = 3 

(77) n = 1; m = 4 

(78) n = 2; m = 2 

(79) n = 2; m - 3 


O 



R" R 


(70) R = R' = H 

(71) R = H; R' = CH 3 

(72) R = CH 3 ; R' = H 

(73) R = R' = H; R = gem-dimethyl 


HIV) agents revealed that branching of the 
ethylene chain by introduction of a methyl 
group at the l'-position (as in racemic 72) di¬ 
minished antiviral activity 25-fold and dimin¬ 


molar-level activity as an inhibitor of 5-li¬ 
poxygenase (46). The size of the oxygen-con¬ 
taining ring as well as the position of the 
oxygen member with respect to the methoxy 
and aryl substituents was varied. The (sev- 
en-membered) oxepane ring derivative (79) 
and the (six-membered) tetrahydropyran 
ring derivative (78) showed two- to 10-fold 
enhanced potency over that of the tetrahy- 
drofuran lead compound (74). The other an¬ 
alogs shown demonstrated much weaker en¬ 
zyme inhibitory activity. 

In a series of spiro-tetraoxacycloalkanes 
(80), with varying heterocyclic ring sizes, it 
was found that the compound where n = 1 
demonstrated marked antimalarial activity 
against P. bergei and P. falciparum, and 
showed low toxicity (47). The analog in which 



4 Homologation of Alkyl Chain or Alteration of Chain Branching 


703 



n = 4 showed strong activity against P. fold- 
parum but it was unimpressive in theP. bergei 
assay. 

In a series of arylsulfonamidophenethano- 
lamines (81)(48), derivatives bearing the sul- 

H 





H—N 

x so 2 -ch 3 

(81) 

fonamido group meta to the ethanolamine side 
chain displayed properties of a /3-adrenoceptor 
partial agonist, whereas 19 compounds bear¬ 
ing the sulfonamido group in the para position 
were /3-antagonists. 

Changing the positions of attachment of 
the two benzene rings linking the quinolinium 
moieties of the calcium-activated potassium 
channel blocker (82) reduced activity 10- to 


A 



L 


( 82 ) 


60-fold in a rat superior cervical ganglion as¬ 
say (49). Other structural variations studied 
included benzene ring A meta-substituted and 
benzene ring L meta-substituted; benzene 
ring A meta-substituted and benzene ring L 
para-substituted; and benzene ring A para- 
substituted and benzene ring L para-substi¬ 
tuted. All of these variations were much less 
potent than those of (82). 

The phenolic group of serotonin (83) was 
incorporated into a pyran ring (84) (50), thus 



I 

H 

(83) 



also introducing an alkyl substituent at posi¬ 
tion 4 of the indole ring system. 

This tricyclic analog (84) lacked serotonin¬ 
like affinity for 5-HT, receptors, but it demon¬ 
strated high and selective affinity for 5-HT, 
receptors. Like serotonin, it stimulated phos¬ 
phatidyl inositol turnover in rat brain slices. 
The low affinity for 5-HT, receptors was ratio¬ 
nalized, in part, on the basis of steric interfer¬ 
ence between the dihydropyran ring and the 
aminoethyl side chain, which inhibits the 
tryptamine system from assuming the folded 
ergotlike conformation, as illustrated in (85), 
which probably approximates the conforma¬ 
tion of serotonin at 5-HT, receptors. The 
methyl ether of serotonin exhibits approxi¬ 
mately the same affinity for 5-HT„ sites as 
does serotonin (51). The methyl ether also has 
marked affinity for 5-HT„ and5-HT„ recep¬ 
tors, but it has diminished affinity (compared 
with serotonin) at 5-HT„ receptors. It was 
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R 




(88) R = C 6 H 5 

(89) R = c-C 6 H n 


In a study of anticonvulsant agents, the 
(S)-benzene ring analog (90) was somewhat 
more potent in a mouse assay than was the 
(S)-cyclohexane analog (91) (56). There was 


suggested (50) that the high affinity for the 
5-HT, receptor exhibited by such compounds 
as (84) demonstrates that the C-5 hydroxyl 
group of serotonin can function as a hydrogen- 
bond acceptor at the receptor. 

Replacement of the benzene ring of the po¬ 
tent indirect acting central noradrenergic 
stimulant methamphetamine (86) by a cyclo¬ 
hexane ring (compound 87) results in some 

ch 2 —ch-ch 3 

N 

/ 

H CH 3 
( 86 ) 



/\.CH 2 -CH-CH 3 
I N 

A 

H CH 3 
(87) 

loss of pressor effect, but the drug, like am¬ 
phetamine, has been used as a nasal deconges¬ 
tant, and it has CNS-mediated anorexigenic 
effect (52, 53). It is said to have somewhat less 
central stimulant action than the correspond¬ 
ing aromatic ring derivatives (54a-d). 

The benzene (88) and cyclohexane (89) 
congeners have almost identical effects in 
blocking bronchoconstriction produced by his¬ 
tamine, serotonin, or acetylcholine in the 
guinea pig in vivo (55). They also showed iden¬ 
tical LD„ values in mice. The stereochemistry 
of these compounds was not addressed. 


r-ch 2 -o 



H 


o 


ch 2 —n-ch-c-nh 2 

I 

CHo 


(90) R = C 6 H 5 

(91) R = c-CbHu 

only a slight difference in potency between 
(R)- and (<S)-(90). The (R)-enantiomer of (91) 
was not reported. 

5 ALTERATION OF STEREOCHEMISTRY 
AND DESIGN OF STEREOISOMERS 
AND GEOMETRIC ISOMERS 

The earlier, almost universally accepted belief 
that if one enantiomer of a chiral molecule 
demonstrates pharmacological activity, the 
other enantiomer will be pharmacologically 
inert, is not valid. It must be anticipated that 
all stereoisomers of an organic molecule will 
exhibit pharmacological effects, frequently 
widely different and unpredictable. Many ex¬ 
amples of qualitative and quantitative differ¬ 
ences in metabolism of enantiomers are docu¬ 
mented (57). 

(± )-3- (3-Hydroxyphenyl)-iV-ft-propylpip- 

eridine (3-PPP, 92) was originally described 
(58) as having highly selective activity at do¬ 
paminergic autoreceptors. 

At high doses (R)-(92) selectively stimu¬ 
lated presynaptic dopaminergic receptor sites, 
whereas at lower doses it selectively stimu¬ 
lated postsynaptic receptor sites (59). In con¬ 
trast, the (*S)-enantiomer stimulated presyn¬ 
aptic dopamine receptors and at the same dose 
level, it blocked postsynaptic dopamine recep- 
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i 

ch 3 

(92) 


tors. Thus, this enantiomer exhibits a bifunc¬ 
tionalmode of dopaminergic attenuation: that 
of presynaptic agonism and postsynaptic an¬ 
tagonism. The observed pharmacological ef¬ 
fects cf the racemic modification are the sum 
total of the complex activities of the two enan¬ 
tiomers, and the pharmacology of racemic 
3-PPP is not an accurate reflection of the 
pharmacological properties of the individual 
enantiomers. The contemporary literature 
strongly reflects the philosophy that pharma¬ 
cological testing only of a racemic mixture is 
inadequate and may be misleading. 

(i?) - (-) -11 -Hydroxy-10-methy laporphine 
(93) is a highly selective serotonergic 5-HT„ 
agonist (60). 

Remarkably, the (S)-enantiomer (94) is a 
potent antagonist at this same subpopulation 
of serotonin receptors (guinea pig ileum prep¬ 




aration) (61).Both enantiomers bind strongly 
to 5-HT„ receptors from rat forebrain mem¬ 
brane. The phenomenon cf enantiomers that 
possess opposite effects (agonist-antagonist) 
at the same receptor, once considered to be 
extremely rare, has recently been noted more 
often, probably because of the increasing rec¬ 
ognition by medicinal chemists and pharma¬ 
cologists that each member of an enantiomeric 
pair may possess its own unique and unpre¬ 
dictable pharmacology. 

In addition to stereochemistry about a car¬ 
bon center, other potentially chiral atoms of¬ 
fer possibilities for pharmacological signifi¬ 
cance. Agastroprokinetic compound (95) with 


(95) 

serotonergic activity bears a chiral sulfoxide 
moiety (62). The enantiomers are equipotent, 
but the (S)-enantiomer demonstrates a greater 
intrinsic activity than that of the (R)-enantio- 
mer. 

Casy (63) cited pharmacological differences 
between stereoisomers of chiral sulfoxide moi¬ 
eties in cholinergic oxathiolane congeners 
(96-99) of muscarine. 


(96) 

cis- and trans-4-Aminocrotonicacids (100) 
and (101) were prepared (64) as congeners of 

y-aminobutyric acid (GABA) (6). 



O. 


S- 


HsC^O' 


r CH 2 —N(CH 3 ) 3 
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( 100 ) 

HzN --^^^' 00011 

( 101 ) 

The folded 2-isomer (100) was inactive in 
assays for GABA agonism, whereas the ex¬ 
tended E-isomer (101) was active. These data 
demonstrate biological differences of geomet¬ 
ric isomers, which in turn involve a parameter 
discussed previously: imposition of a degree of 
structural rigidity on the molecule. A strategy 
analogous to this E[Z olefinic GABA congener 
design addressed cis- and trans- 1,2-disubsti- 
tuted cyclopropane derivatives (102) and 
(103), whose relative effects at GABA recep¬ 
tors paralleled those of the olefinic derivatives 
(65). 

The E-isomer of the diethylstilbestrol 
structure (104) has 10 times the estrogenic 
potency of the Z-isomer; this effect has been 
rationalized from the conclusion that the E- 


geometric isomer is an open-chain analog cf 
the natural estrogen estradiol (105) (66).‘In 
dienestrol (106), the geometric isomerism pos¬ 
sible with olefinic moieties has been further 



( 106 ) 
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exploited to achieve a similar kind of open- 
chain analogy to the steroid ring system as in 
diethylstilbestrol, and a high level of estro¬ 
genic activity results. 

Hexestrol (107), the saturated congener of 


(107) 



diethylstilbestrol (104), is the meso-form of 
the molecule. It has the greatest estrogenic 
potency of the three possible stereoisomers; 
however, it is less potent than diethylstilbes¬ 
trol (67). 

A partial restriction of side-chain flexibility 
in retinoic acid (108) was achieved by incorpo¬ 
rating portions of the side chain into a ben¬ 
zene ring and a cyclopropane ring (109) (68). 




Introduction of the cyclopropane ring 
changes the corresponding trans-olefinic moi¬ 
ety of (108) to a cisoid disposition in (109), 
thus changing the overall steric disposition of 
the side chain. Moreover, the cyclopropane 
ring introduces chirality into the molecule. 
The GS,iS)-enantiomer shown is a potent reti¬ 


noid X-receptor ligand and it is inactive at the 
retinoic acid receptor, whereas the (RJi)-en- 
antiomer is an extremely weak agonist at the 
retinoid X-receptor, although it has some ef¬ 
fect at the retinoic acid receptor. Thus, the 
molecular modifications shown in (109) re¬ 
sult in selectivity of action at these two re¬ 
ceptors. 


6 FRAGMENTS OF THE LEAD MOLECULE 


Design of fragments of a lead molecule is based 
on the premise that some lead molecules, es¬ 
pecially polycyclic natural products, may be 
much more structurally complex than is nec¬ 
essary for optimal pharmacologic effect. It is 
hypothesized that a pharmacophoric moiety 
may be buried within the complex structure of 
the lead compound and, if this pharmacophore 
can be clearly defined, it may be possible to 
"dissect" it out chemically. The result may be 
biologically active, simpler molecules that may 
themselves be used as leads in further analog 
design. A bond disconnection strategy may be 
employed in which bonds in the polycyclic 
structure are broken or removed to destroy 
one or more of the rings. The result may be a 
valuable drug that is more accessible (through 
chemical synthesis) than the original lead 
molecule. A possible disadvantage to this 
strategy of analog design is that the greater 
flexibility that is introduced into a rigid mole¬ 
cule may compromise or destroy the confor¬ 
mational integrity that may have existed in 
the pharmacophoric portion, at the expense of 
activity and/or potency. There may be a simi¬ 
lar destruction of chiral centers, which may be 
undesirable. Morphine (110) typifies a lead 
molecule for which fragment analog design 
has been used. 



( 110 ) 
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The analgesic preceptor pharmacophore 
of morphine has been defined (69) as compris¬ 
ing the basic nitrogen atom, the aromatic ring 
located three carbon atoms from the nitrogen, 
and a quaternary carbon adjacent to the aro¬ 
matic ring, which provides a region of molec¬ 
ular bulk. A bond disconnection strategy in¬ 
volved disruption of the hydrofuran ring to 
give rise to morphinan derivatives [e.g., levor- 
phanol (HI)], whose pharmacologic effects 



closely parallel those of morphine (70). Fur¬ 
ther simplification of the morphine ring sys¬ 
tem led to benzomorphan derivatives, typified 
by metazocine ( 112 ), in which morphinelike 



( 112 ) 


analgesic activity is retained. Finally, 4-phe- 
nylpiperidine derivatives typified by meperi¬ 
dine (113) and the nonheterocyclic system 



(113) 


methadone (114) present the putative analge¬ 
sic pharmacophore with a seemingly minimal 
number of extraneous atoms. These simple 
compounds retain opioid analgesic activity. It 



must be noted, however, that the discovery of 
analgesic activity in 4-phenylpiperidine deriv¬ 
atives was not a result of a systematic struc¬ 
ture-activity study of the morphine molecule, 
but was serendipitous (71). 

Asperlicin (115), a potent cholecystoki- 
nin-A antagonist, was subjected to two differ¬ 
ent bond disconnection strategies, as indi¬ 
cated (72). 

Path A leads to tryptophan derivatives 
(116), some of which are potent cholecystoki- 
nin antagonists (73). Some quinazolinone de¬ 
rivatives (117) of disconnection pathway B 
showed extremely high potency and excellent 
selectivity as cholecystokinin-B receptor sub- 
type ligands (44). A combination of X-ray crys¬ 
tallography and computational chemistry was 
used in the decision-making process in the 
bond disconnection (44) and in the design cf 
the specific target molecules. 

The myoneural-blocking pharmacophore 
in d-tubocurarine (118) was speculated to in¬ 
clude the two cationic heads (the quaternary 
ammonium group and the protonated tertiary 
amine); the cationic heads are separated by 10 
atoms (nine carbons and one oxygen). 

Based on these parameters, a simple mole¬ 
cule, decamethonium (119), in which two tri- 
methylammonium heads are separated by 10 
methylene groups to approximate the interni¬ 
trogen distance in d-tubocurarine, was de¬ 
signed independently by two groups of inves¬ 
tigators (74, 75). This synthetic fragment/ 
analog of d-tubocurarine exhibits a high 
degree of potency and activity in production cf 
flaccid paralysis of skeletal muscles, superfi¬ 
cially like that of the lead compound. How¬ 
ever, the myoneural blockade from d-tubocu¬ 
rarine is of the nondepolarizing type, whereas 
decamethonium produces a depolarizing skel- 
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(117) 


etal muscle blockade. This fundamental mech¬ 
anistic difference is probably attributed, at 
least in part, to the flexibility of the decame- 
thonium molecule compared with that of d- 
tubocurarine. There is a considerable differ¬ 


ence in the spectrum and severity of side 
effects and in the technique of employment of 
these two drugs in clinical practice. In all types 
of analog design, changes in chemical struc¬ 
ture may result in unanticipated changes in 



( 118 ) 
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(CH 3 ) 3 N (CH 2 ) 10 —N(CH 3 ) 3 

(H9) 

mechanism of action, even though the chemi¬ 
cal nature of the pharmacophore may not be 
altered. 

7 VARIATION IN INTERATOMIC 
DISTANCES 

Alteration of distances between portions of 
the pharmacophore of a molecule (or even be¬ 
tween other portions of the molecule) may 
produce profound qualitative and/or quantita¬ 
tive changes in pharmacological actions. In 
a,a>bis-trimethylammonium polymethylene 
compounds (120-123), maximal activity for 


duces a pharmacological change from gangli¬ 
onic blockade to myoneural blockade, and fur¬ 
ther extension to 16-18 methylenes results in 
loss of myoneural effects and a return of gan¬ 
glionic blocking action. 

Hemicholinium (124) competitively inhib¬ 
its the high affinity, sodium-dependentuptake 
of choline into the nerve terminal (the rate¬ 
determining step in acetylcholine synthesis in 
the nerve terminal), thus depleting stores cf 
acetylcholine and producing slow onset, long- 
duration myoneural blockade (78, 79). In a se¬ 
ries of congeners of hemicholinium, the cen¬ 
tral biphenyl portion of the molecule was 
changed to terphenyl (125) and to p-phe- 
nylene (126). Both changes resulted in pro¬ 
found loss of the myoneural blockade charac¬ 
teristic of hemicholinium (68). This result was 
ascribed to alteration of the proposed opti- 


(CH 3 ) 3 N—(CH 2 )— N(CH 3 ) 3 

(120) n- 5 

(121) n - Q 

(122) n = 16 

(123) n = 18 



/ N- 

h 3 c' x ch 3 


-N + 

h 3 c' x ch 3 


blockade of autonomic ganglia (nicotinic N 2 
receptors) resides in those derivatives where n 
= 5 or 6 (compounds 120 and 121) (76, 77). 

Ganglionic effects drop drastically when n 
= 4 or 7. These observations have been ratio¬ 
nalized as being a reflection of attainment of 
optimal interquaternary distance in the 
penta- and hexamethylene congeners, for op¬ 
timal interaction with ganglionic receptor 
subsites. Remarkably, as the number of meth¬ 
ylene groups in (120) is greatly increased, a 
high level of ganglionic-blocking potency re¬ 
turns. The hexadecyland octadecyl congeners 
(122) and (123) are approximately four times 
as potent at autonomic ganglia as the penta- 
and hexamethylene compounds. As was men¬ 
tioned previously, polymethylene bis-quater- 
nary systems, in which the cationic heads are 
separated by 10 methylene groups, have po¬ 
tent effects at myoneural junctions (nicotinic 
N x receptors) and have little ability to affect 
nerve activity at autonomic ganglia. Thus, 
extension of a bis-quaternary polyalkylene 
molecule from five or six methylenes to 10 pro- 


024) R = 


/ V_/ \ 


(125) R = 


(126) R = 


(127) R = 


(128) R = 


(129) R = 


/ V / w ^ 




-CH? 




h 3 c 


(130) R= -(CHs)fi— 

(131) R = -(CH2) 7 — 
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CH 2 —(CH 2 ) m —CH 2 



(132) 


mum interquaternary nitrogen distance of 
14.4 Ain hemicholinium (124), to 18.4Ain the 
terphenyl analog, and to 10.2 A in the p-phe- 
nylene analog. 

The central biphenyl spacer in hemicho- 
linium was changed to a 2,7-disubstituted 
phenanthrene (127), trans ^trans-4 ,4 ' -bicyclo- 
hexyl (128), and 2,2'-dimethylbiphenyl (129). 
In all three of these svstems the 14.4-A inter¬ 
quaternary distance found in hemicholinium 
was maintained; all of these congeners were 
qualitatively and quantitatively similar to 
hemicholinium in inhibition of neuromuscu¬ 
lar transmission. Conformational analysis of 
the polyalkylene congeners (130) and (131) 
demonstrated that, when the flexible polyal¬ 
kylene chain is maximally extended and is in a 
staggered conformation, the interquaternary 
distance in the hexamethylene congener (130) 
is approximately 14 A, and in the heptameth- 
ylene congener (13 l)it is approximately 15 A. 
Both compounds exhibited hemicholinium- 
like inhibition of neuromuscular transmis¬ 
sion, although they were less potent than 
hemicholinium (80). This diminution of do- 
tency might be ascribed to the compromising 
of another structural parameter in the hemi¬ 
cholinium molecule: the rigidity of the central 
biphenyl spacer unit that maintains the inter¬ 
nitrogen distance. 

Replacement of the benzene ring linkers of 
(82) (see above) by alkyl linkers (structure 
132) permitted retention of blocking activity 
on calcium-activated potassium channels (81). 
The most potent member of the series studied 
was that in which m = n = 3. In this compound 
the two respective internitrogen distances 
closely approximate those in the benzene ring- 
linked compound (82). 


In a series of phenylalkylenetrimethylam- 
monium derivatives (133-136), nicotinic ago- 
nismis maximal when n = 3 (compound 136). 

^^/(CH 2 )— N(CH 3 ) 3 

(133) n = 0 

(134) n- 1 

(135) n = 2 

(136) n = 3 

It was concluded (82) that a moiety (here, a 
benzene ring) with high electron density three 
or four single bond lengths (~6A) from the 
cationic center is a requirement for nicotinic 
agonism in the series. These conclusions may 
be compromised by the fact that the alkylene 
series was not extended beyond the three-car¬ 
bon spacer chain. Therefore, it is not known 
whether the four-carbon homolog would dis¬ 
play greater or lesser potency than that of the 
three-carbon molecule. Peculiarly, the first 
two members of the series have only very weak 
nicotine-like activity in the presence of atro¬ 
pine. 

A series of compounds, illustrated by (137), 
was evaluated for in vitro affinity for a, and 
a,-adrenoceptors by radioligand-binding as¬ 
says (83). All compounds showed good affini¬ 
ties for the a, adrenoceptor, with K { values in 
the low nanomolar range. The polymethylene 
chain spacer between furoylpiperazinylpyradiz- 
inone and aryl piperazine moieties was shown 
to influence the affinity and selectivity of these 
compounds. A gradual increase in affinity for 
the a, adrenoceptor was observed, by length- 
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ening the polymethylene chain, up to a maxi¬ 
mum of seven carbon atoms. 

The a 2 ia 1 ratio of adrenoceptor-binding af¬ 
finities for the series of compounds did not 
parallel the a, adrenoceptor-binding affinities 
for the series, although all of the seven (Ci-C-?) 
congeners of (137)had somewhat more affin¬ 
ity for the a, receptor. 
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1 INTRODUCTION 

Many of the top 100 drugs sold worldwide are 
enzyme inhibitors. In recent years, enzyme in¬ 
hibitors not only have provided an increasing 
number of potent therapeutic agents for the 
treatment of diseases, but also have signifi¬ 
cantly advanced the understanding of enzy¬ 
matic transformations. The aim of this chap¬ 
ter is to present current approaches to so- 
called rational inhibitor design, which uses 
knowledge of enzymic mechanisms and struc¬ 
tures in the design process. Rational inhibitor 
design is intended to complement laborious 
and resource-consuming screening processes, 
which consist of testing large numbers of syn¬ 
thetic chemicals or natural products for inhib¬ 
itory activity against a chosen target enzyme. 

1.1 Enzyme Inhibitors in Medicine 

A human cell contains thousands of enzymes 
each of which can, theoretically, be selectively 
inhibited. These enzymes constitute the vari¬ 
ous metabolic pathways that, in concert, pro¬ 
vide the requirements for the viability of the 
cell. A selective inhibitor may block either a 
single enzyme or a group of enzymes, leading 
to the disruption of a metabolic pathway(s). 
This will result in either a decrease in the con¬ 
centration of enzymatic products or an in¬ 
crease in the concentration of enzymatic sub¬ 
strates. The effectiveness of an enzyme 
inhibitor as a therapeutic agent will depend on 
(l)the potency of the inhibitor, (2)its specific¬ 
ity toward its target enzyme, (3)the choice of 
metabolic pathway targeted for disruption, 
and (4) the inhibitor or a derivative possessing 
appropriate pharmacokinetic characteristics. 
Higher potency will mean less drug is required 
to obtain a physiological response, whereas 
high specificity means that the inhibitor will 
react only with its target enzyme and not with 
other sites in the body. Taken together, low 
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dosage and high specificity combine to reduce 
both the toxicity caused by inhibition of other 
vital enzymes and the problems arising from 
the formation of toxic decomposition prod¬ 
ucts. Further, high specificity will generally 
avoid depletion of the inhibitor concentrations 
in the host by nonspecific pathways. The areas 
of potency and specificity will both be ad¬ 
dressed in this chapter. Clearly, the choice of 
target enzyme is also of prime importance for 
chemotherapy, although that is beyond the 
scope of this review. However, there are a 
number of texts available that provide a good 
introduction to this subject (1-4). Good bio¬ 
availability of the drug is also crucial for the 
drug to reach its site of action in the body in 
effective therapeutic concentrations. For ex¬ 
ample, highly polar or charged compounds, 
such as phosphorylated compounds, fre¬ 
quently cannot readily cross cell membranes 
and are therefore generally less useful as 
drugs. Physical approaches to facilitate the 
transport of this class of compounds into the 
cell include the use of liposomes or nanopar¬ 
ticles (5-7). Chemical approaches may also be 
employed. These include the use of prodrugs, 
in which functional groups on the inhibitor 
are modified in such a manner that they are 
able to be taken up by the cell and, later, met- 
abolically converted to the active drug. Pro¬ 
drugs are discussed in more detail in Volume 
2, Chapter 14. 

As indicated earlier, a wide variety of en¬ 
zyme inhibitors have found use in the clinic. 
Tables 17.1-17.3 show a number of these com¬ 
pounds and, although they provide by no 
means an exhaustive list, they do give an indi¬ 
cation of the range of human disease states 
that can be ameliorated with the use of en¬ 
zyme inhibitors. 

The human body, even though its defenses 
are constantly on guard, is still susceptible to 
invasion by foreign pathogens. Since the de- 
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Table 17.1 Examples of Enzyme Inhibitors Used in the Treatment of Bacterial, Fungal, Viral, 
and Parasitic Diseases 


Clinical Use 

Enzyme Inhibited 

Inhibitor 

Antibacterial 

Dihydropteroate synthetase 

Sulphonamides 

Antibacterial 

Dihydrofolate reductase 

Trimethoprim, methotrexate 

Antibacterial 

Alanine racemase 

D-Cycloserine 

Antibacterial 

Transpeptidase 

Penicillins, cephalosporins 

Antifungal 

Fungal sterol 14a-demethylase 

Clotrimazole, ketoconazole 

Antifungal 

Fungal squalene epoxidase 

Terbinafine, naftifine 

Antiviral 

Thymidine kinase and thymidylate kinase 

Idoxuridine 

Antiviral 

DNA, RNA polymerases 

Cytosine arabinoside (Ara-C) 

Antiviral 

Viral DNA polymerase 

Acyclovir, vidarabine 

Antiviral 

HIV reverse transcriptase 

Dideoxyinosine, zidovudine 

Antiviral 

HTV protease 

Saquinavir 

Antiviral 

Influenza virus neuraminidase 

Zanamavir, oseltamivir 

Antiprotozoal 

Pyruvate dehydrogenase 

Organoarsenical agents 

Antiprotozoal 

Ornithine decarboxylase 

a-Difluoromethylornithine 


velopment of the sulfa drugs (sulfonamides), 
enzyme inhibitors have played a vital role in 
controlling these infectious agents. Table 17.1 
provides a list of enzyme inhibitors that have 
been used in the treatment of the various dis¬ 
eases caused by these agents. All these com¬ 
pounds needed to satisfy the usual require¬ 
ments for specificity and low toxicity. 

This can be achieved in a variety of ways. 
For instance, it is possible to inhibit an es¬ 
sential pathway in the pathogen that does 
not exist in the host. D-Cycloserine (1) (Fig. 
17.1), for example, inhibits alanine race- 
mase, an enzyme involved in bacterial cell 
wall biosynthesis and not found in humans 
(8, 9). D-Cycloserine is active against a broad 
spectrum of both gram-positive and gram¬ 
negative bacteria (10), but plays its major 
role in the treatment of tuberculosis (11). 
Conversely, even if both host and pathogen 
contain the same enzymes, it may be possi¬ 


ble to exploit subtle structural differences 
between the isozymes to obtain a highly spe¬ 
cific inhibitor that preferentially binds to 
the invader’s version. Trimethoprim (2) 
shows this selective toxicity. An inhibitor of 
dihydrofolate reductase, trimethoprim is a 
potent antibacterial agent because the bac¬ 
terial enzyme is inhibited at a concentration 
several thousand times lower than that re¬ 
quired for inhibition of the mammalian 
isozyme (12). Acyclovir (3a), an antiviral 
drug used for the treatment of herpes infec¬ 
tions (13,14), also fits into this category. It 
binds very tightly to the Herpes simplex 
DNA polymerase with an estimated half-life 
of about 40 days. Acyclovir is a prodrug be¬ 
cause it requires transformation by a viral 
thymine kinase and cellular phosphotrans¬ 
ferases to the corresponding triphosphate 
(3b) to serve in vivo as an inhibitor of the 
viral DNA polymerase (15). 


Table 17.2 Examples of Enzyme Inhibitors Used in the Treatment of Cancer 


Type of Cancer 

Enzyme Inhibited 

Inhibitor 

Benign prostatic hyperplasia 

Steroid 5a-reductase 

Finasteride 

Estrogen-mediated breast cancer 

Aromatase 

Arninoglutethimide, fadrozole 

Eeukemia, osteosarcoma, head, 

Dihydrofolate reductase 

Methotrexate 

neck, and breast cancer 

Colorectal cancer 

Thymidylate synthase 

5-Fluorouracil 

Eeukemia 

Glutamine-PRPP 

amidotransferase 

6-Mercaptopurine, azathioprine 

Small-cell lung cancer, non- 

Topoisomerase II 

Etoposide 

Hodgkin's lymphoma 

Hairy-cell leukemia 

Adenosine deaminase 

Pentostatin 
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Table 17.3 Examples of Enzyme Inhibitors Used in Various Human Disease States 


Clinical Use 

Enzyme Inhibited 

Inhibitor 

Epilepsy 

GABA transaminase 

y-Vinyl GABA 

Epilepsy 

Carbonic anhydrase 

Sulthiame 

Epilepsy 

Succinic semialdehyde dehydrogenase 

Sodium valproate 

Antidepressant 

Monoamine oxidase (MAO) 

Tranylcypromine, phenelzine 

Antihypertensive 

Angiotensin converting enzyme 

Captopril, enalaprilat 

Cardiac disorders 

Na + ,K + -ATPase 

Cardiac glycosides 

Gout 

Xanthine oxidase 

Allopurinol 

Ulcer 

H + ,K + -ATPase 

Omeprazole 

Hyperlipidemia 

HMG-CoA reductase 

Atorvastatin, simvastatin 

Anti-inflammatory 

Prostaglandin synthase, 

Cyclooxygenase (COX) I and II 

Aspirin, naproxen, ibuprofen 

Arthritis 

Cyclooxygenase (COX) II 

Celecoxib 

Glaucoma 

Acetylcholinesterase 

Neostigmine 

Glaucoma 

Carbonic anhydrase II 

Acetazolamide, dichlorphenamide 


Although their inhibitors are not specifi¬ 
cally therapeutic agents in themselves, the 
p-lactamasesare another important target for 
drug design. These are bacterial enzymes and, 
as with the alanine racemases, are not found 
in humans. Inhibitors of /3-lactamases include 
clavulanic acid (4) (16-20) and sulbactam 
(penicillanic acid sulfone) (5) (18, 21-24). 
These two compounds act to prevent the bac¬ 
terial degradation of penicillins and cephalo¬ 
sporins by p-lactamases, thereby extending 
their lifetime and effectiveness. Accordingly, 
both clavulanic acid (4) and sulbactam (5) 
have reached the market as drugs that act syn- 
ergistically with these commonly prescribed 
antibacterial agents. 

Even though it has proved possible to selec¬ 
tively inhibit the enzymes of a number of 
pathogens, the enzymes of cancer cells have 
proved to be a far more elusive target. Indeed, 
the majority of the currently employed antitu¬ 
mor agents can be described as antiprolifera¬ 
tive agents. These take advantage of the fact 
that many, but not all, tumor cells grow and 
divide more rapidly than normal cells. Lym¬ 
phomas, for example, proliferate more rapidly 
than solid tumors, whereas, conversely, acute 
leukemia cells divide more slowly than the 
surrounding bone marrow cells. Most of the 
enzyme inhibitors used as these antiprolifera¬ 
tive agents (Table 17.2) can also be described 
as antimetabolites (i.e., they inhibit a meta¬ 
bolic pathway), often those involved in DNA 
biosynthesis, which are important for cell sur¬ 
vival or replication. 5-Fluorouracil (6), the 


prodrug form of an inactivator of thymidylate 
synthase (25), and methotrexate (7), an inhib¬ 
itor of dihydrofolate reductase (26, 27), both 
fit into this category. Unfortunately, rapidly 
dividing normal cells, such as hair follicles, the 
cells lining the gastrointestinal tract, and the 
bone marrow cells involved in the immune 
system are also significantly affected. The re¬ 
sultant hair loss, nausea, and susceptibility to 
infection means that this type of chemother¬ 
apy is seldom employed as a first-line defense 
against cancer. 

The inhibition of enzymes involved in met¬ 
abolic pathways is not restricted to anticancer 
agents. A variety of diseases have been corre¬ 
lated with either the dysfunction of an enzyme 
or an imbalance of metabolites. Across section 
of the disease states treated with enzyme in¬ 
hibitors is shown in Table 17.3. Practically, 
these may be treated by the inhibition of an 
individual enzyme or by using enzyme inhibi¬ 
tors to regulate the metabolite concentration 
in the body. For example, an imbalance of the 
two neurotransmitters, glutamate and y-ami- 
nobutyric acid, is responsible for the convul¬ 
sions observed in epileptic seizure. The latter 
is metabolized by y-aminobutyric acid amino¬ 
transferase (GABA-T) and, consequently, in¬ 
hibitors of this enzyme offered themselves as 
potential antiepileptic candidates. This led to 
the development of the GABA-T inhibitor, vi- 
gabatrin (8)(28), which clinically results in an 
increase of the brain concentration of y-ami- 
nobutyric acid and cessation of epileptic con¬ 
vulsions. As with the anticancer agents, block- 
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Figure 17.1. Examples of enzyme inhibitors used clinically. 


ade of a metabolic pathway may also have found to be effective in the treatment of hyper- 
therapeutic benefits. The statins, a group of lipidemia and familial hypercholesteremia 
serum cholesterol-lowering drugs, are inhibi- (33, 34) and have become some of the world's 
tors of hydroxymethylglutaryl-CoA (HMG- best-selling drugs. 

CoA) reductase (29). HMG-CoA reductase cat- Finally, enzyme inhibitors can also be used 

alyzes the irreversible conversion of HMG- to induce an animal model of a genetic disease. 
CoA to mevalonic acid, the rate-determining Inactivation of y-cystathionase by propargyl- 
step in cholesterol biosynthesis (30-32). In- glycine, for example, produces an experimen- 
hibitors such as simvastatin (9) have been tal model of the disease state known as cysta- 
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Table 17.4 Classification of Enzyme Inhibitors Employed in This Chapter 

Noncovalent Inhibitors Covalent Inhibitors 

Rapid reversible inhibitors (ground-state analogs) Chemical modifiers 

Tight, slow, slow-tight binding inhibitors Affinity labels 

Multisubstrate analogs Mechanism-based inhibitors 

Transition-state analogs Pseudoirreversible inhibitors 


thioninuria (35). Deficiency of this enzyme eating how it may be evaluated. The 

leads to the accumulation of cystathionine in discussion will be accompanied by references 

the urine and has sometimes been associated to recent, representative examples from the 

with mental retardation (36). literature. Where appropriate, these examples 

will be of inhibitors of therapeutic interest. 

1.2 Enzyme Inhibitors in Basic Research it should be noted that we will concentrate 

In basic research enzyme inhibitors have on inhibitors directed at the active site of the 

found a multitude of uses. They serve as useful enzyme. While recognizing that there are in¬ 
tools for the elucidation of structure and func- hibitors that bind to regions other than the 

tion of enzymes, as probes for chemical and active site, such as allosteric effectors, these 

kinetic processes, and in the detection of are not the focus of this chapter and will not be 

short-lived reaction intermediates (37). Prod- included. There are many reviews of enzyme 

uct inhibition patterns provide information inhibitors available in the literature (37, 

about an enzyme's kinetic mechanism and the 46-48) and the reader is referred to them for 

order of substrate binding (38). Covalently more detailed analysis, 

binding enzyme inhibitors have been used to 
identify active-site amino acid residues that 

could potentially be involved in substrate 2 RATIONAL DESIGN OF 
binding and catalysis of the enzyme (39,40). NONCOVALENTLY BINDING 
Reversible enzyme inhibitors are routinely ENZYME INHIBITORS 
used to facilitate enzyme purification by using 

the inhibitor as a ligand for affinity chroma- As their name indicates, this class of inhibi- 

tography (41,42) or as eluants in affinity-elu- tors binds to the enzyme's active site without 

tion chromatography (43). Immobilized en- forming a covalent bond. Therefore the affin- 

zyme inhibitors can also be used to identify ity and specificity of the inhibitor for the active 

their intracellular targets (44), whereas irre- site will depend on a combination of the elec- 

versible inhibitors can be used to localize and trostatic and dispersive forces, and hydro- 

quantify enzymes in vivo (45). phobic and hydrogen-bonding interactions. 

In Table 17.4 we have provided the classifi- Traditionally, noncovalently binding enzyme 

cation of the various types of enzyme inhibi- inhibitors were analogs of substrates, prod- 

tors that we employ in this chapter. The clas- ucts, or reaction intermediates. More recently, 

sification may appear somewhat arbitrary, in an explosion in the use of combinatorial chem- 

that some inhibitors may fit into more than istry and rapid screening techniques has seen 

one category. This can arise because these cat- the development of large numbers of enzyme 

egories are attempting to bring together some inhibitors that bear little or no resemblance to 

nonrelated properties such as structure, the substrate or products, yet still bind selec- 

mechanism of action, and kinetic behavior. tively to their target enzyme. Computer-aided 

Thus, what we have classed as a reversible drug design, in the broadest sense, encom- 

inhibitor may, simply because it has a slow passes both structure-based drug design and 

dissociation rate, be described elsewhere in quantitative structure-activity relationship 

the literature as being irreversible. In each in- (QSAR) methods. A complement to the rapid 

stance we will discuss approaches to the de- screening techniques, computer-aided meth- 

sign of that type of inhibitor, as well as indi- ods provide a more focused approach to the 
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design and discovery of both substrate and 
nonsubstrate analog inhibitors. 

In structure-based design, the structure 
cf a drug target interacting with small mol¬ 
ecules is used to guide drug discovery. Con¬ 
sequently, either the three-dimensional en¬ 
zyme structure or, at a minimum, the 
pharmacophore structure must be known. A 
pharmacophore represents the nature of the 
chemical groups of a given ligand and their 
relative orientation important for inhibitor 
binding. Today, structure-based design, 
used in conjunction with docking tech¬ 
niques, combinatorial chemistry, and rapid 
screening not only leads more quickly to 
novel enzyme inhibitors but also greatly re¬ 
duces the number of compounds that must 
be synthesized. More information on these 
approaches may be found in Chapter 10 and 
some recent monographs (49-52). 

Traditionally, an increase in inhibitory or 
biological activity was achieved by synthesiz¬ 
ing an analog of the substrate and then mak¬ 
ing gradual empirical changes in the structure 
by adding or removing functional groups. 
QSAR methods provide a means of making 
this empirical testing more focused. In this 
technique there is no need to know the struc¬ 
ture of the active site. Instead, computer algo¬ 
rithms are employed to correlate the biological 
activity of a series of inhibitors with their 
chemical structure, thereby allowing better 
predictions as to how to change the structure 
to obtain a more potent inhibitor. This topic is 
discussed further in Chapter 1, and detailed 
reviews are also available (53-56). 

Table 17.4 shows the classification of non- 
covalent inhibitors we use in this chapter. 
Based on their kinetics it is possible to distin¬ 
guish among rapid reversible, tight-binding, 
slow-binding, slow-tight-binding, irreversible, 
and pseudoirreversible inhibitors. Conversely, 
inhibitors classified on the basis of structure, 
such as ground-state analogs, multisubstrate 
inhibitors, and transition-state analogs, which 
mimic the structures of substrates and prod¬ 
ucts, reaction intermediates, and transition 
states, may fall into any of the kinetic catego¬ 
ries. However, before introducing these cate¬ 
gories, it is important to have an understand¬ 


ing of the forces involved in the binding of 
substrates and inhibitors to an enzyme's ac¬ 
tive site. 

2.1 Forces Involved in Forming the Enzyme- 
Inhibitor Complex 

To understand the design concepts of the var¬ 
ious types of noncovalently binding enzyme 
inhibitors, a basic knowledge of the binding 
forces between an enzyme's active site and its 
inhibitors is required. The forces involved in a 
substrate or an inhibitor binding to an en¬ 
zyme's active site are, as with a drug binding 
to a receptor, the same forces that are experi¬ 
enced by all interacting organic molecules. 
These include ionic (electrostatic) interac¬ 
tions, ion-dipole and dipole-dipole interac¬ 
tions, hydrogen bonding, hydrophobic interac¬ 
tions, and van der Waals interactions. A brief 
overview of the forces involved follows. More 
comprehensive treatments can be found in 
Chapter 4 and elsewhere (57-60). 

The binding of an inhibitor is dependent on 
a variety of interactions, and it is the sum of 
these interactions that will determine the de¬ 
gree of affinity of an inhibitor for the particu¬ 
lar enzyme. The reversible binding of an in¬ 
hibitor to an enzyme's active site can be 
described as shown in Equation 17.1. 


k i 

E + I^E-I (17.1) 

k-i 


There is an equilibrium between the free 
enzyme (E), inhibitor (I),and the enzyme-in¬ 
hibitor complex (E . I). The affinity of an in¬ 
hibitor for the enzyme is measured by the in¬ 
hibition constant K { , which is the dissociation 
constant of the enzyme-inhibitor complex, at 
equilibrium (Equation 17.2). 



[E][I] 
[E • I] 


(17.2) 


The lower the K i value, the better the in¬ 
hibitor, given that the equilibrium lies more in 
favor of enzyme-inhibitor complex formation. 
The affinity of an inhibitor for an enzyme may 
be related to the standard free energy (AG°) of 
a system by Equation 17.3. 
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AG° = RT In K { (17.3) 

where R is the universal gas constant and T the 
temperature in degrees Kelvin. The more nega¬ 
tive the value of A G°, the more favorable the 
interaction at equilibrium, and the smaller the 
K { value. It should be noted that, from Equation 
17.3, at physiological temperature relatively 
small changes in free energy, only 2-3 kcal/mol, 
will have a significant effect on K v 

The standard free energy (A G°) can also be 
expressed in terms of enthalpic (AH 0 ) and en- 
tropic (AS") components (Equation 17.4). 

AG° = AH° - TA.S° (17.4) 

Equation 17.4 states that the free energy of 
a system is lowered (i.e., the reaction is made 
more favorable) by either a decrease in en¬ 
thalpy or an increase in entropy. This is also 
an important concept because there are both 
enthalpic and entropic components to the 
forces that contribute to the strength of the 
enzyme-inhibitor interaction. 

When discussing the forces involved in the 
noncovalent binding of a substrate/inhibitor 
to an enzyme, or drug to a receptor, it must be 
recognized that these interactions will be car¬ 
ried out in an aqueous medium. The physical 
properties of water mean that noncovalent in¬ 
teractions in aqueous solution will be signifi¬ 
cantly different from those interactions ob¬ 
served in either an organic medium or in the 
gas phase. A water molecule has electronic 
asymmetry; the strongly electronegative oxy¬ 
gen atom withdraws electron density from the 
hydrogen atoms. This creates partial positive 
charges on the hydrogens and a partial nega¬ 
tive charge on the oxygen. As a result a water 
molecule possesses a permanent dipole mo¬ 
ment, facilitating strong interactions with 
other water molecules as well as with any 
charged or polar species. 

Water is both a donor and acceptor of hy¬ 
drogen bonds. Consequently, in bulk solvent, 
water molecules are extensively hydrogen 
bonded to each other. These are relatively 
weak bonds (^5 kcal/mol) and, at physiologi¬ 
cal temperature, are rapidly broken and re¬ 
formed. However, the hydrogen-bonding net¬ 
work affects many of the properties of water. 


For example, water has a higher melting 
point, boiling point, and heat of vaporization 
than those of comparable hydrides such as 
H 2 S and NH,. The heat capacity of water in¬ 
dicates that it is highly structured and its sur¬ 
face tension (73 dyne cm -1 at 20°C)is consid¬ 
erably higher than that of most liquids (20-40 
dyne cm -1 ). The dielectric constant of water 
(80) is also considerably higher than that of 
most liquids, which are generally less than 30. 
Ethanol, for example, has a dielectric constant 
of 24, whereas those of benzene and hexane 
are 2.3 and Irrespectively. All told, water is 
a unique solvent, and one that has a major 
influence on binding interactions between an 
enzyme and an inhibitor. 

Hydrogen bonds are readily formed be¬ 
tween water and biologically important atoms 
such as the hydrogen bond acceptors N and 0 
and, to a lesser extent, S. The conjugate acids 
NH and OH may act as hydrogen bond donors. 
Molecules containing these atoms have the ca¬ 
pacity for many hydrogen-bonding interac¬ 
tions with water and, as a result, are usually 
soluble in water. However, solute-solute hy¬ 
drogen bonding interactions are less favorable 
because their formation will require the dis¬ 
ruption of favorable solute-water hydrogen 
bonds. Thus, what may be strong hydrogen 
bonds in the gas phase, or in organic media, 
are often considerably weaker in aqueous me¬ 
dia. 

Water's high dielectric constant makes it 
extremely effective in solvating, dissociating, 
and dissolving most salts. Because of its per¬ 
manent dipole, water is readily able to interact 
with ionic species, with the result that ionic 
solute-solute interactions are less favored. 
The situation is analogous to that observed for 
hydrogen bonding and again results in a weak¬ 
ening of the normally strong interactions be¬ 
tween ions that occur in the gas phase or non¬ 
polar media. This is sometimes described as a 
"leveling effect." 

Small amounts of many nonpolar sub¬ 
stances can also dissolve in water. However, 
these substances do not interact very well 
with water and prefer to interact with each 
other. The force driving this interaction, 
known as the hydrophobic force, is not so 
much an attraction between hydrophobic mol¬ 
ecules as an entropic effect arising from the 
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displacement of water. Indeed, there are no 
hydrophobic forces in the gas phase or in non¬ 
polar solvents. However, collectively, hydro- 
phobic forces are thought to transcend other 
types of forces, particularly in the folding of 
proteins, in all biological systems. 

2.1.1 Electrostatic Forces. Although we re¬ 
cognize that, in essence, all forces between at¬ 
oms and molecules are electrostatic, here we 
use the term to describe ion-ion, ion-dipole, 
and dipole-dipole interactions. At physiologi¬ 
cal pH, the side-chains of basic residues such 
as lysine and arginine and, to a lesser extent, 
the imidazole ring of histidine will be proton- 
ated, whereas the acidic groups on the side 
chains of aspartic and glutamic acid residues 
will be deprotonated. In addition, the A'-termi- 
nal amino groups and C-terminal carboxylates 
will be ionized. Therefore, in addition to atoms 
with permanent and induced dipoles, an en¬ 
zyme potentially will have several charged 
groups available for binding to charged or po¬ 
larized groups on a substrate or inhibitor. As 
described by Equation 17.5, the electrostatic 
force (F) between the charged atoms {q 1 and 
q 2 ) will depend on the distance between the 
charged groups (r )and the dielectric constant 
of the surrounding medium ( D ), 


F = 


9iQ2 

r 2 D 


(17.5) 


The strength of an ion-ion interaction is 
inversely related to the square of distance be¬ 
tween the ions, whereas ion-dipole and dipole- 
dipole interactions have 1/r 4 and 1/r 6 relation¬ 
ships, respectively. Because the strength of 
the interaction decreases more slowly with 
distance, ion-pair interactions can be thought 
of as long-range interactions. Conversely, in¬ 
teractions involving dipoles are effective over 
only a short range, although, because they are 
much more prevalent, dipole interactions may 
be more significant to the overall binding pro¬ 
cess. Clearly, the dependency of the strength 
of interaction on the distance between atoms 
is an important consideration when designing 
potential enzyme inhibitors. 

Equation 17.5 also leads to the fact that 
electrostatic interactions are less favorable in 


polar solvents. As discussed above, because of 
its high net permanent dipole moment, water 
is very polar and has a large dielectric con¬ 
stant. The high polarity of water greatly di¬ 
minishes the attraction or repulsion forces be¬ 
tween any two charged groups giving rise to 
the leveling effect of water. It is somewhat dif¬ 
ficult to predict the exact strength of a charge- 
charge interaction between an enzyme and an 
inhibitor. For example, the formation of a salt 
bridge (charge-charge) interaction between an 
enzyme (Enz) and an inhibitor (I) may be de¬ 
scribed by Equation 17.6. 

Enz—S h 3 . (H 2 0), + I—COP • (H 2 0) r - 
Enz—8 h 3 • e 0 2 C—I + (H 2 0), + , (17.6) 

Both the charged species are initially sol¬ 
vated by water, and to form the salt bridge 
both ions must be desolvated. This comes at 
some enthalpic cost, but the freeing of water 
molecules leads to a concomitant, favorable in¬ 
crease in entropy. The strength of the ion pair 
will depend on the stability of the salt bridge 
vs. that of the individual solvated ions. If the 
salt bridge is buried in a relatively hydropho¬ 
bic active site, it is less solvated and will be 
more favored than the same interaction in a 
solvent-exposed active site. 

2.1.2 van der Waals Forces. Also called 
nonpolar interactions or London dispersion 
forces, these are the universal attractive inter¬ 
actions that occur between atoms. As two mol¬ 
ecules closely approach each other there is an 
interpenetration of their electron clouds. As a 
consequence, temporary local fluctuations in 
the electron density occur, giving rise to a tem¬ 
porary dipole in each molecule, even though 
the molecules may, in themselves, have no net 
dipole moment. Thus there will be an attrac¬ 
tive force between the two molecules, with the 
magnitude of the force depending on the po¬ 
larizability of the particular atoms involved 
and the distance between each other. Electro¬ 
negative oxygen has, for example, a much 
lower polarizability than that of a nonpolar 
methylene group. Accordingly, dispersion 
forces are considerably stronger between non¬ 
polar compounds than between nonpolar com- 
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pounds and water. The optimal distance be¬ 
tween the atoms is the sum of each of their van 
der Waals radii, so these forces come into play 
only when there is good complementarity be¬ 
tween enzyme and inhibitor. Although van 
der Waals forces are quite weak, usually 
around 0.5-1.0 kcal/mol for an individual at¬ 
om-atom interaction, they are additive and 
can make an important contribution to inhib¬ 
itor binding. 

2.1.3 Hydrophobic Interactions. Hydropho¬ 
bic interactions may be described as entropy- 
based forces. When a nonpolar compound is 
dissolved in water, the strong water-water in¬ 
teractions around the solute lead to an effec¬ 
tive "ordering" of the structure of the solvent. 
This is entropically unfavorable; that is, there 
is negative entropy of dissolution. When a 
nonpolar inhibitor binds to a nonpolar region 
of an enzyme, all the ordered water molecules 
become less ordered as they associate with 
bulk solvent, leading to an increase in entropy. 
According to Equation 17.4 any increase in en¬ 
tropy will lead to a decrease in free energy and, 
through Equation 17.3, stabilization of the en¬ 
zyme-inhibitor complex. It has been calcu¬ 
lated that a single methylene-methylene in¬ 
teraction releases about 0.7 kcdmol of free 
energy. Even though this figure is not high, 
given that enzymes and inhibitors usually 
have large regions of hydrophobic surface, this 
type of bonding may also play a significant role 
in inhibitor binding. 


significant in nonpolar solvents, water greatly 
diminishes their magnitude. The energy of the 
amide-amide N H . -O hydrogen bond is about 
5 kcdmol, and is typical for hydrogen bonds 

(60). 

It should be remembered that, for a hydro¬ 
gen bond to form between an enzyme and an 
inhibitor, any hydrogen bonds between the in¬ 
hibitor and water, as well as those between the 
enzyme and water, must be broken (Equation 
17.7). 

H 

I—H—O. + E -oC 

v H 

(17.7) 
H 

I--E + o—H---0^ 

H 

Overall, the total number of hydrogen 
bonds remains constant and, provided that 
the hydrogen bonds between the inhibitor and 
enzyme are not significantly more favorable 
than those between water and the inhibitor or 
those between water and the enzyme, the net 
change in enthalpy is usually insignificant. On 
the other hand, formation of the enzyme-in¬ 
hibitor complex usually leads to an overall in¬ 
crease of entropy because the inhibitor re¬ 
mains bound to the enzyme and the formerly 
bound water molecules are released. 


H 1 

i 

H 


2.1.4 Hydrogen Bonds. A hydrogen bond 
occurs when a proton is shared between two 
electronegative atoms (i,e., —X—H*. *Y). 
Electron density is pulled from the hydrogen 
by X, giving the hydrogen a partial positive 
charge that is strongly attracted to the non- 
bonded electrons of Y. The bond is usually 
asymmetric, with one of the heteroatoms, the 
hydrogen bond donor, having a normal cova¬ 
lent bond distance to the proton. The other 
heteroatom, the hydrogen bond acceptor, is 
usually at a distance somewhat shorter than 
the van der Waals contact distance and, for 
optimal hydrogen bonding, the atoms should 
be arranged linearly. A hydrogen bond is a spe¬ 
cial type of dipole-dipoleinteraction and, as we 
have seen, although these forces can be quite 


2.1.5 Cation-7r Bonding. Recently it has 
become apparent that there is another impor¬ 
tant noncovalent binding force that may be 
exploited when designing enzyme inhibitors. 
Cations, from simple ions such as Li + to more 
complex organic molecules such as acetylcho¬ 
line, are strongly attracted to the electron- 
rich (it) face of benzene and other aromatic 
compounds (61,62). Cation-Tbonds, as well as 
other amino-aromatic interactions, are com¬ 
mon in structures in the protein data bank 
(63), and it has been estimated that more than 
25% of tryptophan residues are involved in in¬ 
teractions of this type (64). The finding that 
the cationic group of acetylcholine was bound 
primarily by aromatic residues, most espe¬ 
cially by a tryptophan residue, not by the ex- 
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pected carboxylate anion, provided evidence 
that cation-7r interactions may play an impor¬ 
tant role in ligand binding (65, 66). Model sys¬ 
tems suggest that, energetically, the cation-7r 
interaction can compete with full aqueous sol¬ 
vation in binding cations (61), and there is 
now significant effort being expended in 
studying the contribution of these interac¬ 
tions to molecular recognition (62, 66). 

In summary, the K i provides an indication 
cf the relative stability of the enzyme-inhibi¬ 
tor complex compared to stability of the en¬ 
zyme and inhibitor free in solution. Moreover, 
it is clear that entropy, enthalpy, and water 
will all have a major impact on the binding of 
an inhibitor to an enzyme. 

2.2 Steady-State Enzyme Kinetics 

Just as an appreciation of the forces involved 
is essential to comprehending the binding of 
an inhibitor to an enzyme, so is an under¬ 
standing of the kinetic analysis of an enzyme- 
catalyzed reaction essential to any kinetic 
evaluation of an inhibitor. In this section we 
provide a brief introduction to the study of 
enzyme kinetics, particularly steady-state ki¬ 
netics. Regardless, the reader is advised to re¬ 
fer to other sources for more in-depth reviews 
of the kinetic equations and mathematical 
derivations involved (38,60, 67-71). 

2.2.1 The Michaelis-Menten Equation. In 

the simplest case, an enzyme-catalyzed reac¬ 
tion involves the conversion of a single sub¬ 
strate to a single product, as shown in Equa¬ 
tion 17.8. 

E + S^E'S-E-P^E + P (17.8) 

The free enzyme (E) binds the substrate (S) 
to form a noncovalent enzyme-inhibitor com¬ 
plex (E . S). This is assumed to be a rapid, re¬ 
versible process, not involving any chemical 
changes, and with the affinity of the substrate 
for the enzyme's active site being determined 
by the binding forces discussed above. A chem¬ 
ical transformation of substrate to product 
(P), initially in complex with enzyme (E . P), 
then takes place. Finally, the product (P) is 
released into the medium with concomitant 
regeneration of free enzyme (E). 


As can be seen from the following discus¬ 
sion, it is not difficult to carry out a kinetic 
analysis of a single-substrate reaction such as 
that described in Equation 17.8. However, as 
more substrates are added the task becomes 
more complex. Fortunately, kinetic analysis of 
enzymatic reactions involving two or more 
substrates can be made easier by varying the 
concentration of only one substrate at a time. 
By keeping all but one of the substrates at 
fixed, saturating concentrations, the reaction 
rate will depend only on the concentration of 
the varied substrate. This permits the use of 
the kinetic analysis employed for enzyme-cat¬ 
alyzed, single-substrate reactions even for 
complex multisubstrate reactions. In a further 
simplification, the dissociation of the E. P 
complex is assumed not to be rate limiting, 
and the reversion of product to substrate is 
assumed to be negligible. The latter assump¬ 
tion is valid under what are known as initial 
velocity conditions, that is, when less than 
about 5%of substrate has been consumed. Un¬ 
der these conditions, the concentration of P is 
low, and Equation 17.8 simplifies to Equation 
17.9. 

k x k 2 

E + S E- S —» E + P (17.9) 

k-i 

Generally, kinetic analyses are carried 
out by studying the reaction under steady- 
state conditions, that is, when the concen¬ 
tration of the enzyme is well below that of 
the substrate. Under those circumstances, 
following a brief preequilibrium period, the 
concentrations of the various enzyme-bound 
species, E . S and E . P in Equation 17.8, be¬ 
come effectively constant and the rate of 
conversion of substrate to product will 
greatly exceed the change in concentration 
of any enzyme species. This is an approxima¬ 
tion but, provided the substrate concentra¬ 
tion does not greatly change (e.g., under ini¬ 
tial velocity conditions), it is a very useful 
approximation. Given steady-state condi¬ 
tions, the Michaelis-Menten equation 
(Equation 17.10) is a quantitative descrip¬ 
tion of the reaction described by Equation 
17.9. 
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The Michaelis-Menten constant K M is a 
combination of rate constants and is indepen¬ 
dent of enzyme concentration under steady- 
state conditions. It is equal to the substrate 
concentration at which half the maximum ve¬ 
locity of the enzyme-catalyzed reaction is 
reached; that is, when [S] = K M , then v = 
V 2 V max . For the reaction illustrated in Equa¬ 
tion 17.9, i£ M is described by Equation 17.11. 


Figure 17.2. Plot showing dependency of the ini¬ 
tial velocity (v) on substrate concentration [S] for an 
enzyme-catalyzed reaction obeying Michaelis-Men¬ 
ten (saturation) kinetics. 


[SF m „ 
v K» + [S] 

r m „ = k 2 m 


(17.10) 


This implies that the initial velocity (u) is 
directly proportional to the enzyme concen¬ 
tration [E], and that v follows saturation ki¬ 
netics with respect to the substrate concentra¬ 
tion [S]. This is shown graphically in Figure 
17.2 and explained as follows: at very low sub¬ 
strate concentrations v increases in a linear 
fashion, so that v = V max [S]/lf M . As the sub¬ 
strate concentration increases, the observed 
increase in v is less than the increase in [S]. 
This trend continues until, at high (saturat¬ 
ing) substrate concentrations, v becomes ef¬ 
fectively independent of [S] and tends toward 
the limiting value V max . 

^max is the maximal velocity that can be 
achieved at a specific enzyme concentration. 

In the simple Michaelis-Menten mechanism 
described by Equation 17.9, there is only one 
E . S complex and all binding steps are rapid. 
In this instance, V max is the product of the 
enzyme concentration [E] and k, (also known 
as ^eat), which is the first-order rate constant 
for the chemical conversion of the E . S com¬ 
plex to free enzyme and product. The catalytic 
constant k^ is often referred to as the turn¬ 
over number because it represents the maxi¬ 
mum number of substrate molecules con¬ 
verted to products per active site per unit 
time. In a more complicated reaction, k ^ is a 
function of all the first-order rate constants 
and, effectively, sets a lower limit on all the 
chemical rate constants. 


k<y -I- &_i 
is _ _ 1 

Ku ~ k t 


(17.11) 


If, for a given reaction, k__ x > k,, then 
Equation 17.11 simplifies to K M = K s , where 

is the dissociation constant for the enzyme 
substrate complex. It is important to remem¬ 
ber that the Michaelis-Menten equation holds 
true not only for the mechanism as stated 
above, but for many different mechanisms 
that are not included in this treatment. In 
summary, K M can be described as an apparent 
dissociation constant for all enzyme-bound 
species and, in all cases, it is the substrate 
concentration at which the enzyme operates 
at half-maximal velocity. 

Another parameter often referred to when 
discussing Michaelis-Menten kinetics is k cat ! 
K m . This is an apparent second-order rate con¬ 
stant that relates the reaction rate to the free 
(not total) enzyme concentration. As de¬ 
scribed above, at very low substrate concen¬ 
trations when the enzyme is predominantly 
unbound, the velocity (v) is equal to [EUSlfc^/ 
K m . The value of k, JK M sets a lower limit on 
the rate constant for the association of enzyme 
and substrate. It is sometimes referred to as 
the specificity constant because it determines 
the specificity of the enzyme for competing 
substrates. 

Again, for more detailed treatment of this 
subject the reader should refer to more spe¬ 
cialized texts (38,60, 67-69). 


2.2.2 Treatment of Kinetic Data. Analysis 
of Michaelis-Menten kinetics is greatly facili¬ 
tated by a linear representation of the data. 
Converting the Michaelis-Menten Equation 
17.10 into Equation 17.12 leads to the popular 
Lineweaver-Burk plot. 
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0 1/[S] 

Figure 17.3. The Lineweaver-Burkplot. 


11 k m 

v V max + V m JS] 


(17.12) 


IF you plot 1/u against 1/1S1 (Fig. 17.3), the 
y-intercept gives a value of 1/V max and the 
intercept gives a value of - ~UK M . The slope of 
the line is equal to K M /V max . Although very 
popular, the Lineweaver-Burk plot suffers 
from the disadvantage that it emphasizes 
points at lower concentrations and com¬ 
presses data points obtained at high concen¬ 
trations (67). As a result it is not recom¬ 
mended for obtaining accurate kinetic 
constants. 

A preferable, alternative form of the 
Michaelis-Menten equation is that of the 
Eadie-Hofstee plot (Equation 17.13) 


Kuv 

v = V max --~- (17.13) 

As shown in Fig. 17.4, plotting v against 
i;/[S] results in the y-intercept providing a 
value of V max , whereas the x-intercept pro¬ 
vides V max /K M , and the slope of the line is 
equal to -JC M . 

Another linear representation of the 
Michaelis-Menten equation is the Hanes- 
Woolf plot (Equation 17.14). 


[S] 


v 


V7 


[S] + 


K 


M 


V, 


(17.14) 


max 





Figure 17.4. The Eadie-Hofstee plot. 

Finally, it is possible to directly plot pairs of 
v, [S] data in such a way as to directly deter¬ 
mine and Vjnax values. Qf the linear graph¬ 
ical methods, the direct linear plot of 
Eisenthal and Cornish-Bowden(72), shown in 
Fig. 17.6, is often considered to provide the 
best estimates of K M and values. In this 
method pairs of v and [S] values are obtained 
in the usual manner. A v value is plotted on 
they-axis and a corresponding negative value 
of [S] is plotted on the x-axis. A straight line is 
then drawn, passing through the points on the 
two axes and extending beyond the "point of 
intersection." This is repeated for each set of v 
and [S] values. Thus, there are n sets of lines 
for n pairs of values. A horizontal line drawn 
from the point of intersection to they-axis pro¬ 
vides the value, whereas a vertical line 
from the point of intersection to thex-axis pro¬ 
vides the K m value. 

Each of these linear plots has its own mer¬ 
its, particularly for plotting inhibition data 



Thus a plot of [S]/u vs. [S] is linear, with a 
slope of 1/V max (Fig. 17.5). The y-intercept 
gives K M IV nia3L and the x-intercept gives ~K M . 


Figure 17.6. The Hanes-Woolf plot. 
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Figure 17.6. Eisenthal-Cornish-Bowdendirect lin¬ 
ear plot cf enzyme kinetic data fitting the Michaelis- 
Menten equation. 


(38), and its own drawbacks (67). However, 
the rapid advances in personal computing 
make it relatively easy to fit kinetic data to the 
Michaelis-Menten equation (or other appro¬ 
priate hyperbolic functions) by use of a variety 
of commercial graphical or spreadsheet pack¬ 
ages. One simple package, HYPER, which is 
readily available on the Internet (http:// 
www.ibiblio.org/pub/academic/biology/molbio/ 
ibmpc/hyperl02.zip), simultaneously fur¬ 
nishes Michaelis-Menten parameters ob¬ 
tained using hyperbolic regression analysis, as 
well as those obtained using three of the plots 
described here. As such, it provides a rapid 
contrast of these graphical methods but, un¬ 
fortunately, is not suitable for the study of in¬ 
hibition kinetics. In addition, the recent 
monograph by Copeland (71) provides a list of 
useful computer software and Internet sites 
for the study of enzymes. 

2.3 Rapid, Reversible Inhibitors 

This class of inhibitors acts by binding to the 
target enzyme's active site in a rapid, revers¬ 
ible, and noncovalent fashion. The net result 
is that the active site is blocked and the sub¬ 
strate is prevented from binding. Accordingly, 
in designing inhibitors of this type, optimiza¬ 
tion of the noncovalent binding forces be¬ 
tween the inhibitor and the active site of the 
enzyme is of paramount importance. 


Michaelis-Menten kinetics and, depending on 
their preference of binding to the free enzyme 
and/or the enzyme-substrate complex, com¬ 
petitive, uncompetitive, and noncompetitive 
inhibition patterns can be distinguished. For 
the purposes of this discussion it will be as¬ 
sumed that the initial equilibrium of free and 
bound substrate is established significantly 
faster than the rate of the chemical transfor¬ 
mation of substrate to product, that is, k lf k_ 1 
> k 2 (Equation 17.9). As discussed in section 
2.2.1, this reduces K M to the dissociation con¬ 
stant K s of the E . S complex. 

2.3.1.1 Competitive Inhibitors. A competi¬ 
tive inhibitor often has structural features 
similar to those of the substrates whose reac¬ 
tions they inhibit. This means that a compet¬ 
itive inhibitor and enzyme's substrate are in 
direct competition for the same binding site on 
the enzyme. Consequently, binding of the sub¬ 
strate and the inhibitor are mutually exclu¬ 
sive. A kinetic scheme for competitive inhibi¬ 
tion is shown in Equation 17.15. 

S 

E ^=± E • S -» E + P 

T (17.15) 

E • I 

The enzyme-bound inhibitor may either 
lack an appropriate functional group for fur¬ 
ther reaction, or may be bound in the wrong 
position with respect to the catalytic residues 
or to other substrates. In any event, the en¬ 
zyme-inhibitor complex E . I is unreactive (it 
is sometimes referred to as a dead-end com¬ 
plex) and the inhibitor must dissociate and 
substrate bind before reaction can take place. 

Solving this kinetic scheme for simple 
Michaelis-Menten kinetics leads to Equation 
17.16. 


v — 


[S]V n 


[S] + jt m ( 1 + y ) 


(17.16) 


2.3.1 Types of Rapid, Reversible Inhibitors. 

Binding of these inhibitors follows simple 


Here, K it sometimes called the inhibition 
constant, is the equilibrium constant for the 
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dissociation of the enzyme-inhibitor complex, 
and is described by Equation 17.17. 


. _ [E][I] 
1 [E-I] 


(17.17) 


Competitive inhibitors do not change the 
value of V maxJ which is reached when suffi¬ 
ciently high concentrations of the substrate 
are present so as to completely displace the 
inhibitor. However, the affinity of the sub¬ 
strate for the enzyme appears to be de¬ 
creased in the presence of a competitive in¬ 
hibitor. This happens because the free 
enzyme E is not only in equilibrium with the 
enzyme-substrate complex E. S, but also 
with the enzyme-inhibitor complex E . I. 
Competitive inhibitors increase the appar¬ 
ent K m of the substrate by a factor of (1 + 
[Il/iCj). The evaluation of the kinetics is 
again greatly facilitated by the conversion of 
Equation 17.15 into a linear form using Line- 
weaver-Burk, Eadie-Hofstee, or Hanes-Woolf 
plots, as shown in Fig. 17.7. 

2.3.1.2 Uncompetitive Inhibitors. Uncom¬ 
petitive inhibitors do not bind to the free en¬ 
zyme. They bind only to the enzyme-substrate 
complex to yield an inactive E • S • I complex 
(Equation 17.18). 





S 

E E • S —> E + P 

jjj (17.18) 

E • S * I 


Figure 17.7. (a) Lineweaver-Burk, (b) Eadie-Hof- 
stee, and (c) Hanes-Woolf plots exhibiting competi¬ 
tive inhibition patterns. The dashed line indicates 
the reaction in the absence of inhibitor, whereas the 
solid lines represent enzymatic reactions in the 
presence of increasing concentrations cf inhibitor. 


Uncompetitive inhibition is rarely ob¬ 
served in single-substrate reactions but is 
frequently observed in multisubstrate reac¬ 
tions. An uncompetitive inhibitor can pro¬ 
vide information about the order of binding 
of the different substrates. In a bisubstrate- 
catalyzed reaction, for example, a given in¬ 
hibitor may be competitive with respect to 
one of the two substrates and uncompetitive 
with respect to the other. The linear plots 
for classical uncompetitive inhibition pat¬ 
terns are described by Equation 17.19 and 
are illustrated in Fig. 17.8. 


[S]V max 
1 + [Pi 

^ + 1 + [l\/Ki 


(17.19) 


As with a competitive inhibitor, the appar¬ 
ent K m for the substrate decreases by a factor 
of (1+ [I VKi) because the formation of E • S * I 
will use up some of the E . S, thereby shifting 
the equilibrium further in favor of E . S forma¬ 
tion. However, uncompetitive inhibitors also 
decrease V max by the same factor because 
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E — E-S 


E+ P 


S 


(17.20) 


E-1 =5= 


ESI 


Simple Michaelis-Menten kinetics of non¬ 
competitive inhibitors are described in Equa¬ 
tion 17.21. 




Figure 17.8. (a) Lineweaver-Burk, (b) Eadie-Hof- 
stee, and (c) Hanes-Woolf plots exhibiting uncom¬ 
petitive inhibition patterns. The dashed line indi¬ 
cates the reaction in the absence of inhibitor, 
whereas the solid lines represent enzymatic reac¬ 
tions in the presence of increasing concentrationsof 
inhibitor. 


some of the enzyme remains in the E . S . I 
form, even at infinite substrate concentration. 

2.3.1.3 Noncompetitive Inhibitors. Classi¬ 
cal noncompetitive inhibitors have no effect on 
substrate binding and vice versa, given that they 
bind randomly and reversibly to different sites 
on the enzyme. They also bind with the same 
affinity to the free enzyme and to the enzyme- 
substrate complex. Both the enzyme- inhibitor 
complex E. I and the enzyme-substrate-inhibi¬ 
tor complex E. S. I are catalytically inactive. 
The equilibria are outlined in Equation 17.20. 


[S]V max 
i + [i Wi 
[S] + K u 


(17.21) 


From Equation 17.21 it is clear that non¬ 
competitive inhibitors have an effect only on 
V max , decreasing it by a factor of (1 + [1]/^), 
consequently giving the impression of reduc¬ 
ing the total amount of enzyme present. As 
with an uncompetitive inhibitor, a portion of 
the enzyme will always be bound in the non¬ 
productive enzyme-substrate-inhibitor com¬ 
plex E e S • I, causing a decrease in maximum 
velocity, even at infinite substrate concentra¬ 
tions. However, because noncompetitive in¬ 
hibitors do not affect substrate binding, the 
K m value of the substrate remains unchanged. 
Linear plots for noncompetitive inhibition are 
shown in Fig. 17.9. 

Again, this type of inhibition is rarely seen 
in single-substrate reactions. It should also be 
noted that, frequently, the affinity of the non¬ 
competitive inhibitor for the free enzyme, and 
the enzyme-substrate complex, are different. 
These nonideally behaving noncompetitive in¬ 
hibitors are called mixed-type inhibitors, and 
they alter not only V max but also K M for the 
substrate. Further discussion of inhibitors cf 
this type may be found in Segel (38). 

Sometimes steady-state kinetics are insuf¬ 
ficient to analyze the mechanism of inactiva¬ 
tion for a given inhibitor. For example, irre¬ 
versible enzyme inhibitors that bind so tightly 
to the enzyme that their dissociation rate ( k nff ) 
is effectively zero also exhibit noncompetitive 
inhibition patterns. They act by destroying a 
portion of the enzyme through irreversible 
binding, thereby lowering the overall enzyme 
concentration and decreasing V max . The ap¬ 
parent K m remains unaffected because irre- 
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Figure 17.9. (a) Lineweaver-Burk, (b) Eadie-Hof- 
stee, and (c) Hanes-Woolf plots exhibiting noncom¬ 
petitive inhibition patterns. The dashed line indi¬ 
cates the reaction in the absence of inhibitor, 
whereas the solid lines represent enzymatic reac¬ 
tions in the presence of increasing concentrations of 
inhibitor. 

versible inhibitors do not influence the disso¬ 
ciation constant of the enzyme-substrate 
complex. A simple experiment to distinguish 
between a reversible noncompetitive inhibitor 
and irreversible inhibitor is shown in Fig. 
17.10, and a comprehensive review describing 
the kinetic evaluation of irreversibly binding 
enzyme inhibitors is available (73). Allosteric 
effectors may also show noncompetitive ki¬ 
netic patterns by rendering the enzyme in the 
E • S . I complex less active than that in the 
E . S complex. Again, additional analyses are 
often required in these less well defined 



Figure 17.10. Plot showing dependency of F max on 
the total enzyme concentration,[El„„. An irrevers¬ 
ible inhibitor will titrate a fraction of the enzyme 
[Eli***. 

situations. Such analyses may include more 
in-depth steady-state kinetics, as well as pre- 
steady-state kinetics, and testing for irrevers¬ 
ible inhibition. Irreversible covalently binding 
enzyme inhibitors are discussed extensively 
later in this chapter. 

2.3.2 Dixon Plots. Another linear method 
for plotting inhibition data, the Dixon plot, is 
shown in Fig. 17.11 (74). In this method the 
initial velocity is measured as a function of 
inhibitor concentration at two or more fixed 
substrate concentrations. By plotting 1 jv 
against [I] for each substrate concentration, 
the different types of inhibition can easily be 
distinguished. Further, in cases of competitive 
or noncompetitive inhibition, the value of K i 
may be determined from the x-axis value at 
which the lines intercept. Overall, the Dixon 
plot is probably the simplest and most rapid 
graphical method for obtaining a K i value. 

2.3.3 IC 50 Values. The potencies of en¬ 
zyme inhibitors evaluated using rapid screen¬ 
ing techniques are often reported in terms of 
IC 50 values rather than K- x values. An IC 50 
value is the inhibitor concentration that is re¬ 
quired to halve the activity of the enzyme, that 
is, that concentration that leads to 50% en¬ 
zyme inactivation. It is important to recognize 
that an IC 50 value is not a constant, except in 
the case of noncompetitive inhibition, and is 
dependent on the substrate concentration 
used in the experiment. IC 60 values are com- 
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Figure 17.11. Dixon plots for (a) competitive, (b) 
uncompetitive, and (c) noncompetitive inhibitors. 
The solid lines represent enzymatic reactions in the 
presence cf increasing concentrations of substrate. 
The dashed line represents the reaction at infinite 
substrate concentration. 

monly determined by keeping the concentra¬ 
tion of the substrate and the enzyme constant 
and incrementally varying the concentration 
of the inhibitor. This simple experimental ap¬ 
proach makes it relatively easy to screen large 
numbers of potential inhibitors. Industrial 
high-throughput screens often employ half¬ 
log increments, and the value of IC 50 provides 
a ready means of comparing the extent of in¬ 


hibition. It should also be noted that the IC 50 
value can be no less than half the concentra¬ 
tion of the enzyme, a factor that becomes im¬ 
portant if the inhibitor is very potent or if high 
concentrations of enzyme are employed. 

For a competitive inhibitor, a K L value may 
be obtained using the relationship described 
by Equation 17.22. 

IC 50 = «;(1 + (17.22) 

Provided that a reasonable substrate con¬ 
centration (<0.1 K m ) is employed for the ex¬ 
periment, the IC 50 value may be a reasonable 
approximation of the true K { . Equation 17.22 
indicates that substrate concentrations 
greater than about 0.1-fold of the K M value 
will lead to an underestimation of the value, 
an underestimation that becomes quite signif¬ 
icant at high substrate concentrations. 

The dependency of the IC 50 value on the 
substrate concentration for uncompetitive in¬ 
hibitors is given in Equation 17.23. 

IC 60 = X i (l + ^) (17.23) 

In this instance it is at high concentrations 
of the substrate that the value is compara¬ 
ble to the IC 50 value, and a significant under¬ 
estimation will occur at lower substrate 
concentrations. 

From these two equations it is clear that, 
for preliminary screening when the type of in¬ 
hibition is unknown, substrate concentrations 
close to the K M value should be used. This 
minimizes the deviation of the IC 50 value from 
the K { value to, in the cases of competitive and 
uncompetitive inhibitors, a factor of 2. If nec¬ 
essary, a Dixon plot can be used to provide a 
quick indication of the K { and the type of inhi¬ 
bition (38,74 ).It should be noted that the re¬ 
lationship between IC 50 and K { requires the 
initial velocity to be linearly dependent on the 
concentration of inhibitor. In the cases of 
mixed-competitive and irreversible inhibitors, 
the dependency of the inhibitor concentration 
and the initial velocity is nonlinear. There¬ 
fore, in those cases, the use of the IC 50 value is 
limited. 







2 Rational Design of Noncovalently Binding Enzyme Inhibitors 


733 


2.3.4 Examples of Rapid Reversible Inhibi¬ 
tors. Competitive inhibitors are often similar 
in structure to one of the substrates of the 
reaction they are inhibiting. Inhibitors of this 
type are sometimes called substrate analogs 
and their binding affinity (K { ) usually approx¬ 
imates that of the substrate. One of the first 
reactions inhibited by a substrate analog was 
that catalyzed by succinate dehydrogenase 
(Equation 17.24). 

~o 2 c— ch 2 -ch 2 -co 2 “ ^ dnate ^ 

. , dehydrogenase 

succinate - 

°2 C \ / H (17.24) 

C=C 
/ \ 

H C0 2 _ 

fumarate 

This reaction is competitively inhibited by 
malonate ( _ 00CCH 2 C00 _ ) that has, like 
succinate, two carboxylate groups. It is there¬ 
fore able to bind to the enzyme's active site 
but, with only one carbon atom between the 
carboxylates, further reaction is impossible. 

Substrate analogs are rarely useful as en¬ 
zyme inhibitors, given that large concentra¬ 


tions are required for inhibition, and their in¬ 
hibition is readily overcome by any buildup of 
substrate. However, they are often useful probes 
for determining enzyme specificity and even 
mechanism. Phenylethanolamine iV-methyl- 
transferase (PNMT) catalyzes the terminal 
step in epinephrine (adrenaline) biosynthesis, 
the conversion of norepinephrine to epineph¬ 
rine (Equation 17.25), with concomitant con¬ 
version of S-adenosyl-L-methionine (SAM, 
AdoMet) to S-adenosyl-L-homocysteine(SAH, 
AdoHcy). 

S-Adenosyl-L-homocysteine (10) (Fig. 
17.12), the product of the reaction, and 2-(2,5- 
dichlorophenyDcyclopropylamine (ll)are an¬ 
alogs of S-adenosyl-L-methionine and norepi¬ 
nephrine, respectively. Using these inhibitors 
it was possible to ascertain the binding order 
of the two substrates (75). Kinetic analyses 
showed that SAH was a competitive inhibitor 
of SAM and a noncompetitive inhibitor of nor¬ 
epinephrine, whereas (ll)was a competitive 
inhibitor of norepinephrine and an uncom¬ 
petitive inhibitor of SAM. This indicates that 
the binding of substrates is ordered, with SAM 
binding first. If norepinephrine bound first, it 
would be expected that SAH would be an un¬ 
competitive inhibitor and (1 l)would be non¬ 
competitive with respect to SAM. If a random 



Norepinephrine 


PNMT S-adenosyl-L-methionine 

’’ (17.25) 



Epinephrine 


S-adenosyl-L-homocysteine 
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Cl 



Figure 17.12. Inhibitors of phenylethanolamine 
N-methyltransferase. 

binding mechanism were in operation, it 
would be expected that both inhibitors would 
be competitive with either substrate. More de¬ 
tail on similar uses of reversible inhibitors 
may be found elsewhere (76). 

2.4 Slow-, Tight-, and Slow-Tight-Binding 
Inhibitors 

Not all reversible inhibitors have an instanta¬ 
neous effect on the rate of an enzymatic reac¬ 
tion. Some inhibitors, known as slow-binding 
enzyme inhibitors, can take a considerable 
time to establish the equilibrium between the 
free enzyme and inhibitor, and the enzyme- 
inhibitor complex. This time period may be on 
the scale of seconds, minutes, or even longer. 
The enzyme-inhibitor complexes have slow off 
(dissociation) rates, but the on (association) 
rates may be either slow or fast. Hence, the 
term slow binding does not necessarily indi¬ 
cate a slow binding of inhibitor to enzyme but 
rather the fact that reaching equilibrium is a 


slow process. Other inhibitors, known as 
tight-binding inhibitors, bind their target en¬ 
zyme with such high affinity that the popula¬ 
tion of free inhibitor molecules is significantly 
depleted when the enzyme-inhibitor complex 
is formed. Often, tight-binding inhibitors also 
have a slow onset of action, and are termed 
slow-tight-binding inhibitors. What these 
three types of inhibitors have in common is 
that, generally, the major assumptions of 
Michaelis-Mentenkinetics do not hold true. 

As with rapid reversible inhibition, for 
slow-binding inhibition to take place a signif¬ 
icantly larger concentration of inhibitor than 
enzyme is required. However, reaching equi¬ 
librium slowly is incompatible with the as¬ 
sumption of Michaelis-Menten kinetics that 
inhibitors bind much more quickly than the 
enzyme turns over. Unlike rapid reversible 
and slow-binding inhibitors, both tight-bind¬ 
ing and slow-tight-binding inhibitors are ef¬ 
fective at concentrations comparable to that of 
the enzyme. At that point, the inhibitor con¬ 
centration is no longer independent of the en¬ 
zyme concentration, as assumed for Michae¬ 
lis-Menten kinetics. A summary of the 
properties of reversible enzyme inhibitors is 
shown in Table 17.5. Although we give a brief 
overview of these types of inhibitors, excellent 
and more in-depth descriptions of slow-, 
tight-, and slow-tight-binding inhibitors have 
appeared elsewhere (71,77-80). 

2.4.1 Slow-Binding Inhibitors. Two differ¬ 
ent mechanisms have been suggested to ratio¬ 
nalize the slow-binding behavior of competi¬ 
tive inhibitors (71, 78, 80). In the one-step 
mechanism A, the direct binding process of the 
inhibitor to the enzyme is slow (Equation 
17.26); that is, the magnitude of & 3 [I] is small 
relative to & X [S] and k,, the rate constants for 
the conversion of substrate to product. 


Table 17.5 Classes of Reversible Inhibitors 


Inhibitor Class 

Ratio of Inhibitor to Enzyme 
Necessary for Inhibition 

Rate at Which Equilibrium 
is Attained E + I ^ E . I 

Rapid, reversible 

I>E 

Fast 

Tight binding 

I~E 

Fast 

Slow binding 

1>E 

Slow 

Slow-tight binding 

I- E 

Slow 
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S,Ai k 2 

E . E • S- * E + P 

*-i 



I,&3 


(17.26) 


E • I 


The slow on rate (k 3 ) has been attributed 
to the inhibitor encountering some barrier 
to binding at the active site. The inhibitor 
has to overcome this barrier by correct align¬ 
ment. Once aligned properly, it binds so 
tightly that it is released very slowly from 
the enzyme, making the overall equilibrium 
process extremely slow. The equilibrium dis¬ 
sociation constant for the E . I complex K if 
derived from Equation 17.26, is given by 
Equation 17.27. 


. [E][I] k _ 3 
1 [E • I] k 3 


(17.27) 


g . ([E t ]-[E •!])([!]-[E-1]) 

j-g > (17.28) 

For slow-tight-binding inhibitors, k_ 3 is 
very small and formation of the E . I complex 
is essentially irreversible. Use of Equation 
17.28 ensures that depletion of free enzyme 
and free inhibitor by formation of the E . I 
complex is taken into account. 

In mechanism B, the more common mech¬ 
anism for slow-bindinginhibition (80), the ini¬ 
tial equilibrium between the enzyme, inhibi¬ 
tor, and the E • I complex is fast. However, 
there is a subsequent slow rearrangement to 
form the final, more stable enzyme-inhibitor 
complex (E . I*) (Equation 17.29). 

S,Ai k 2 

E — ~" E • S-» E + P 

k-i 

(17.29) 


This is the same equilibrium as that for a 
rapid reversible inhibitor (Equation 17.17). 
From Equation 17.27, it should be noted that, 
if is very small (as with a tight-binding in¬ 
hibitor) and [I] is varied in the region of K if 
even if the on rate (k 3 ) is diffusion controlled, 
both& 3 [I] and&_ 3 will be very small. Thus, the 
onset of inhibition for a tight-binding inhibi¬ 
tor can appear to be slow, even though k 3 is in 
the range expected for rapid reversible inhib¬ 
itors (78). It is possible to carry out kinetic 
analyses of tight-binding inhibitors. This can 
be done either by including a preincubation 
step, to allow sufficient time for the enzyme 
and inhibitor to reach equilibrium, or by car¬ 
rying out the reaction at very high concentra¬ 
tions of both substrate and inhibitor. More de¬ 
tailed discussion of these methods, with 
appropriate references, can be found in a re¬ 
cent volume by Copeland (71). 

If the slow-binding inhibitor described by 
Equation 17.26 also binds very tightly, it is 
referred to as a slow-tight-binding inhibitor. 
For inhibitors of this type, K^s given by Equa¬ 
tion 17.28, where [E T J represents the total en¬ 
zyme concentration (in all forms) present in 
solution. 


k 4 

E • I E • I* 

k- 4 


Here the dissociation constant for the ini¬ 
tial E • I complex is still k_ 3 /k 3> but there is 
also a dissociation constant for the formation 
of the E . I* complex. The second dissociation 
constant is given by Equation 17.30. 



Kjk —4 

k 4 + k _ 4 


[E][Ij 

[E • I] + [E • P] 


(17.30) 


To observe the slow onset of inhibition and 
the E . I complex, must be smaller than K i 
and k_ 4 smaller than k 4 . However, if k_ 4 is 
considerably smaller than k 4 , then the forma¬ 
tion of the E. I* complex will be effectively 
irreversible (i.e., the inhibitor is of the slow- 
tight-binding variety). Under those circum¬ 
stances it will again be necessary to take de¬ 
pletion of free enzyme and free inhibitor into 
account when determining K t and K* (78). 

The slow rearrangement step has been cor¬ 
related with conformational changes of the en¬ 
zyme following initial binding of the inhibitor. 
It is possible that the enzyme in its transition 
state conformation may be better equipped to 
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A good comparison of rapid reversible and 
slow-binding inhibition can be found in a recent 
study on the inhibition of arginase, an enzyme 
that catalyzes the hydrolysis of L-arginine to 
yield L-omithine and urea (Equation 17.31). 


Figure 17.13. Reaction progress curves in the 
presence of increasing concentrations cf a slow- 
binding inhibitor. 



accommodate the inhibitor. A slow change to 
reach this optimal conformation will lead to 
tighter binding of the inhibitor and even slower 
releasefrom the enzyme. An alternative sugges¬ 
tion is that the slow-binding process is linked to 
a requisite displacement of water molecules 
from the active site (81). Initially the inhibitor 
binds loosely to the enzyme, but upon release of 
water molecules the gain in entropy leads to a 
more stable E. I* complex. 

One way of quickly identifying a potential 
slow-binding inhibitor is to examine the 
progress of the reaction at increasing concen¬ 
trations of inhibitor. Under initial velocity 
conditions (Section 2.2.1), an enzyme-cata¬ 
lyzed reaction will exhibit a linear increase in 
the amount of product formed over time. A 
reaction progress plot for a reaction carried 
out in the presence of a rapid reversible inhib¬ 
itor will also be linear. However, a slow-bind¬ 
ing inhibitor will initially show a linear rela¬ 
tionship, although this will change as the 
inhibitor binds, resulting in a biphasic plot. 
Typical biphasic progress curves for a reaction 
in the presence of increasing concentrations of 
a slow-binding inhibitor are shown in Fig. 
17.13. The initial burst of the reaction, the 
linear section of the graphs, can be described 
by competitive Michaelis-Menten kinetics. 
The higher the concentration of the inhibitor, 
the shorter the initial linear section of each 
curve and the slower the subsequent final 
steady-state rate, as observed in the asymptotes 
in Fig. 17.13. If the inhibitor concentration is 
small, the substrate might be too depleted to 
permit observation of steady-state rates. 



Urea 


Arginase competes with nitric oxide syn¬ 
thase (NOS) for arginine and, in doing so, 
helps regulate NOS. As a consequence, inhib¬ 
itors of arginase may have therapeutic use in 
treating NO-dependent smooth muscle disor¬ 
ders, including erectile dysfunction (82). A se¬ 
ries of arginine analogs were prepared and 
tested as inhibitors of arginase (83). Three ex¬ 
amples are shown in Fig. 17.14. One of these, 
IV^-hydroxy-L-arginine (12), is a competitive 
inhibitor of arginase at both pH 7.5 and pH 9.5 
with K { values of 2 and 1.6 yM, respectively. 
The two boronic acid derivatives, 2(<S)-amino- 
6-boronohexanoic acid (13) and S-(2-borono- 
ethyl)-L-cysteine (14), were also competitive 
inhibitors at pH 7.5 with K { values of 0.25 and 
0.31 1 uM, respectively. However, at pH 9.5, the 
boronic acid derivatives both became slow- 
binding inhibitors, apparently binding by 
mechanism B and with lowered K i values of 
8.5 and 30 nM, respectively. It was suggested 
that, at low pH, the trigonal form of the bo¬ 
ronic acid derivative predominates, and that 
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+ NH 3 + nh 3 


( 14 ) ( 16 ) 


Figure 17.14. (a) Competitive and (b) slow-binding inhibitors of arginase. 


this species binds with one hydroxyl, coordi¬ 
nating to one of the two requisite manganese 
ions. At pH 9.5 the tetrahedral species is the 
major form and this initially binds also with 
one hydroxyl coordinated to a manganese ion. 
Then, in a second, slower step, a water mole¬ 
cule that bridges the two active-site manga¬ 
nese ions is displaced by a second hydroxyl 
group on the boronic acid (83). Support for 
this mechanism is provided by crystal struc¬ 
tures, showing both (15)and (16)are bound in 


the active site of arginase as tetrahedral spe¬ 
cies at alkaline pH (82). Of course, compound 
(12) is unable to form the tetrahedral species 
and is a competitive inhibitor at all times. 

Leucine arninopeptidase (LAP) is a metal- 
loenzyme that has been inhibited in a slow- 
binding manner. This exopeptidase catalyzes 
the hydrolysis of N-terminal amino acids, par¬ 
ticularly those with a leucine at the N-termi- 
nus, although it does have a broad specificity 
(Equation 17.32). 


CH 3 . 


ch 3 


CH, 


CH, 


O 


0 


O 


CH—CH 2 —CH—C —NH— CH— C —NH— CH— C —NH— CH^ 


nh 3 


Ri 


leucine arninopeptidase 


r 5 


R3 


(17.32) 
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Ri 


R? 


R, 
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Figure 17.15. Slow-tight-bindinginhibitors of leucine aminopeptidase. 


Bestatin (17) (Fig. 17.15) and amastatin 
( 18 ) have been identified as slow-tight-binding 
inhibitors of LAP from porcine kidney, with K L 
values in the low nanomolar range (84). Later, 
bestatin was shown to be a slow-bindinginhib- 
itor of LAP employing mechanism B, with a K i 
value of 0.11 ijlM and a K* value of 1.3 n M. 
Values of 1.5 X 10 -2 s -1 and 2 X 10~ 4 s _1 
were obtained fork, andk_ 4 (Equation 17.29), 
respectively (85).It was assumed that the in¬ 
hibition of bovine lens leucine aminopeptidase 
(blLAP) by amastatin would also proceed by 
mechanism B. This prediction was supported 
by an X-ray crystallography study of the 
amastatin-blLAP complex (86), which sug¬ 
gested that (18)(and,by analogy, 17 ) initially 
binds to a Zn 2+ atom in a groove in the active 
site. The slow step in binding was seen as a 
subsequent coordination to a second Zn 2+ 
atom located deeper in the active site (86). 

It is difficult to find clear-cut examples of 
slow-binding inhibition occurring by mecha¬ 
nism A. However, the inhibition of Factor Xa 
by a peptidyl-a-ketothiazole was found to be 
unusual because it appeared that the forma¬ 
tion of E . I was partially rate limiting. Factor 
Xa is a trypsinlike protease found in the blood 


coagulation pathway, which cleaves pro¬ 
thrombin forming thrombin that, in turn, pro¬ 
motes blood clotting (Equation 17.33). 

Inhibitors of Factor Xa activity offer poten¬ 
tial as anticoagulants and several irreversible 
inhibitors of Factor Xa have been developed. 
One of the few tight-binding reversible inhib¬ 
itors of Factor Xa is BnS0 2 -D-Arg-Gly-Arg-ke- 
tothiazole (19). 

The inhibitor could be displaced from Fac¬ 
tor Xa by substrates and, based on steady- 
state assumptions, the dissociation constant 
for ( 19 ) was found to be 14 p M (87). However, 
the reaction progress curves indicated a slow- 
binding process, probably by mechanism B. 
Stopped-flow fluorescence studies, combined 
with kinetic analysis, showed that the isomer¬ 
ization step (E . I h- E . I*) is unusually fast 
and that the formation of E • I is, at least, par¬ 
tially rate limiting. 

In some instances the type of inhibition has 
been found to be isozyme specific. For exam¬ 
ple, inducibly expressed isozymes (iNOS) and 
constitutively expressed isozymes (cNOS) of 
nitric oxide synthase (NOS) all catalyze the 
conversion of L-arginine to L-citrulline and ni¬ 
tric oxide (Equation 17.34). 
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The inhibition of human iNOS by AT-(3- 
(aminomethyl)benzyl)acetamidine (20) (Fig. 
17.16) was found to proceed by mechanism B, 
with an overall K d of <7 nM. Conversely, in¬ 
hibition of constitutive isoforms of the human 
enzyme was found to be rapidly reversible, 
with K t values in the micromolar range (88). 
This is in contrast to results obtained for the 
arginine analog, L-N°-nitroarginine (21), 
which was found to be a rapid reversible inhib¬ 
itor of mouse macrophage iNOS, with a K\ of 
4.4 fxM } and a slow-binding inhibitor of brain 
cNOS with a K d (assuming mechanism A) of 
15 nAf (89). 

Many more examples of these types of in¬ 
hibitors can be found in the review by Morri¬ 
son and Walsh (78). 


CH 3 

I N 
HN^Ah 
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Figure 17.16. Inhibitors of ni¬ 
tric oxide synthase. 
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Figure 17.17. Pyrophosphate 

analogs used to inhibit DNA 

polymerase. 
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2.5 Inhibitors Classified on the Basis of 
Structure/Mechanism 

As with any reaction, an enzyme-catalyzedreac- 
tion must proceed from the ground state 
through a transition state before products are 
formed. In addition, there are often some high- 
energy intermediates along the pathway. 
Knowledge and understanding of an enzyme's 
mechanism permits the identification of the 
high-energy intermediates and the prediction of 
the structures of the transition states. Armed 
with that knowledge, it is possible to design en¬ 
zyme inhibitors based on the structures of the 
various intermediates along the reaction path¬ 
way. Inhibitors designed in this manner are oc¬ 
casionally referred to as mechanism-based in¬ 
hibitors. However, for purposes of this chapter, 
we will reserve that term for the covalently bind¬ 
ing inhibitors described in Section 3. 

2.5.1 Ground-State Analogs. The ground 
state of an enzymatic reaction consists of the 
substrates and the products. Compounds that 
mimic the substrate of an enzymatic reaction 
have been examined earlier (Section 2.3) and 
are not discussed again here. There are many 
examples of enzymatic reactions that are in¬ 
hibited by some or all of the reaction products. 
Both epinephrine and S-adenosyl-L-homocys- 
teine, for example, are inhibitors of phenyleth- 
anolamine N-methyltransferase (Equation 
17.35). In much the same way as described 
earlier for substrate analogs, product analogs 
can also be used to obtain information about 
the binding mechanism of enzymes (90). 

Phosphonoformate (22) (Fig. 17.17) is an 
antiviral agent that is used clinically in the 
treatment of herpes simplex virus (HSV) and 
human cytomegalovirus (HCMV) (91). It acts 
as a product analog, blocking the pyrophos¬ 
phate-binding site, in the reaction catalyzed 
by DNA polymerase (Equation 17.35). It is 
also effective, using the same mechanism, 
against HIV reverse transcriptase (91). 


/ 0H 
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Mg 24 - 
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3' - T—a—C—C—A—T- 
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DNA polymerase catalyzes the transfer of a 
complementary deoxynucleoside monophos¬ 
phate moiety from its triphosphate (dNTP) to 
the 3' hydroxyl of the primer terminus, with 
subsequent release of pyrophosphate (PP i} eq. 
17.35). Initially, phosphonoformate (22) and 
phosphonoacetate (23) were identified as in¬ 
hibitors of HSV DNA synthesis (92). Detailed 
kinetic studies (93), using DNA polymerase in¬ 
duced by avian herpes viruses, showed that 
phosphonoacetate (23) was a noncompetitive 
inhibitor of the four dNTPs. At low levels of 
dNTPs it was a noncompetitive inhibitor of 
the substrate DNA, becoming uncompetitive 
at saturating dNTP levels. It was also found 
that (23) was a competitive inhibitor of pyro¬ 
phosphate, with a Ki value in the low micro¬ 
molar range, in the dNTP-PPi exchange reac¬ 
tion catalyzed by a turkey virus DNA 
polymerase (93). The inhibition patterns were 
identical to those observed using pyrophos¬ 
phate as an inhibitor. Therefore it was con¬ 
cluded that (23) acted as an analog of pyro¬ 
phosphate and competed for the same binding 
site (93). Later, both (22) and (23) were con¬ 
firmed as acting as pyrophosphate (i.e., prod¬ 
uct) analog inhibitors of isolated HSV DNA 
polymerase (94). 
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2.5.2 Multisubstrate Analogs. A large num¬ 
ber of enzymatic reactions involve the simul¬ 
taneous binding of two or more substrates at 
the active site. The bound substrates must be 
in close proximity to each other and positioned 
in such a way as to facilitate covalent bond 
formation or the transfer of a functional group 
from one substrate to another. Multisubstrate 
analog inhibitors mimic the simultaneous 
binding of two or more substrates at the active 
site of the enzyme. The advantage of this, for a 
bireactant system, is shown in Equation 
17.36. 


A + B 


X, 


A-B 


% 


E 


-Kms 


^ E 
A ^ 

^ E -A* B — 


(17.36) 
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K A K B 


There are two ways the two substrates, A 
and B, may bind to the enzyme to form an 
E • A • B complex. First, and most likely, they 
bind individually (in either a random or an 
ordered fashion) with dissociation constants 
cf K a and K B . Second, the substrates may 
come together, positioned in such a way as to 
facilitate their subsequent reaction with a dis¬ 
sociation constant of K Bi . This reactive com¬ 
plex A e B then binds to the enzyme with a 
dissociation constant of .Kms- In general, the 
formation of A • B is entropically unfavorable. 
However, a bisubstrate analog, designed to 
mimic A • B, can often be prepared by co¬ 
valently connecting the corresponding sub¬ 
strates or substrate analogs with a suitable 
linker group. Linking the two groups effec¬ 
tively overcomes the unfavorable entropic 
barrier. It has been calculated that an ideal 
bisubstrate analog inhibitor can bind up to 10 8 
times more tightly than the product of the 
substrate-binding constants (i.e., l/i£ Bi maybe 
as high as 1(U 8 M ). This figure is based on 
entropic considerations and also assumes a 
perfect fit of the bisubstrate analog inhibitor 
to the two binding sites on the enzyme (57). 

Where does this high affinity come from? A 
multi substrate analog inhibitor will bind 


more tightly than substrate analog inhibitor 
because it has (1) the entropic advantage of 
reduced molecularity and (2)an additive bind¬ 
ing contribution from each of the substrates it 
mimics. For example, when two single-sub¬ 
strate analog inhibitors bind separately, but 
next to each other, two sets of translational 
and rotational entropies are lost. However, 
when a bisubstrate analog inhibitor binds it 
loses only a single set of translational and ro¬ 
tational entropies (57, 60). Further, let us as¬ 
sume that the bisubstrate analog binds to the 
same two sites as two single-substrate analog 
inhibitors. In that case there will be a gain in 
entropy from the release of water molecules 
from each substrate-binding site, as well as 
the favorable enthalpic contributions from the 
formation of hydrogen bonds, buried salt 
bridges, and so forth in each site. These favor¬ 
able free-energy contributions will be the 
same for a bisubstrate analog as for the two 
individual inhibitors binding simultaneously. 
On the other hand, compared to the binding of 
a single-substrate analog, the multisubstrate 
analog inhibitor gains favorable binding en¬ 
thalpies and entropies from the additional 
binding site(s), while still losing only one set of 
translational and rotational entropies. Thus 
the binding of a multisubstrate analog should 
be very tight, without needing any assistance 
from transition-state complementarity. 

Inhibitors that combine two substrates are 
termed bisubstrate analogs, whereas those 
combiningthree substrates are termed trisub¬ 
strate analogs and so on, with the former be¬ 
ing the most common. The design of a bisub¬ 
strate analog inhibitor ordinarily requires the 
development of two single-substrate analog 
inhibitors of reasonable affinity. The two sin¬ 
gle-substrate inhibitors are then connected by 
an appropriate linker, and the optimal length 
of the linker is determined experimentally. 
Under normal circumstances, the K L value for 
a bisubstrate analog inhibitor can be expected 
to approximate the product of the values of 
the two substrate analogs. A guide to a reason¬ 
ably achievable K i for a bisubstrate analog also 
may be obtained from the product of the K u 
values of the individual substrates. For exam¬ 
ple, if two substrates of an enzymatic reaction 
have binding constants in the millimolar 
range, a bisubstrate analog would be expected 
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to have a K { value in the micromolar range. 
Note also that, if the enzyme binds substrates 
in a random manner, then a multisubstrate 
inhibitor should exhibit competitive inhibi¬ 
tion patterns with each substrate it mimics 
because the binding of the inhibitor should be 
mutually exclusive with that substrate. If the 
enzyme employs an ordered mechanism, then 
the inhibitor should be competitive with the 
first substrate to bind and uncompetitive with 
other substrates. 

The multisubstrate analog approach to en¬ 
zyme inhibition has the additional advantage 
in that it provides a high degree of specificity. 
The combination of two or more substrates 
will usually produce a unique structure, un¬ 
likely to bind to other enzymes that may uti¬ 
lize any one of the substrates. This approach 
has even been used to design isozyme-specific 
inhibitors (95).It should also be noted that the 
distinction between a transition-state analog 
(Section 2.5.3) and a multisubstrate analog in¬ 
hibitor is often quite arbitrary. In fact many 
inhibitors described as transition-state ana¬ 
logs are often actually analogs of high-energy 
reaction intermediates that, in turn, may have 
structures somewhat akin to those of multi¬ 
substrate analog inhibitors. However, multi¬ 
substrate analog inhibitors are intended to 
mimic the combined substrates in their 
ground-state forms and do not require any 
contribution from transition-state stabiliza¬ 


tion. Several general reviews on multisub- 
strate analog inhibitors have appeared (96- 
98), and multisubstrate analogs also receive 
some discussion in reviews on transition-state 
analogs (99-101). 

Glycinamide ribonucleotide transformy- 
lase (GAR TFase) catalyzes the transfer of a 
formyl group from A 10 -formyltetrahydrofo- 
late to glycinamide ribonucleotide (Equation 
17.37). This is a crucial step in de novo purine 
biosynthesis, which is essential for cell divi¬ 
sion, and GAR TFase has become a target en¬ 
zyme for the development of antineoplastic 
agents. 

Inglese et al. (102) were able to synthesize 
the bisubstrate inhibitor j3-thioGARdidea- 
zafolate (J3-TGDDF) (24) (Fig. 17.18). This 
compound combines nearly all the features cf 
both substrates, linked by a stable thioether 
bridge, and was found to inhibit GAR TFase 
with a K x value of 250 p M (102). /3-TGDDF 
acted as a slow, tight-binding inhibitor (Sec¬ 
tion 2.4) and the K i value was about three 
times lower than the product of the K M values 
of the substrates. More recently, the crystal 
structure of the complex between BW1476U89 
(25) and GAR TFase was obtained (103). 
BW1476U89is another multisubstrate analog 
and has a K t value of about 100 p M (104).The 
structure confirms that the inhibitor binds in 
those sites identified previously as substrate- 
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Figure 17.18. Bisubstrate 
analog inhibitors of GAR- 
TFase. 


binding sites, and provides a starting point for 
development of even more potent transition- 
state analogs. 

The condensation of carbamyl phosphate 
and L-aspartate, catalyzed by aspartate trans- 
carbamoylase (ATCase), produces iV-carba- 
myl-L-aspartate (Equation 17.38). This is one 
cf the early steps in de novo pyrimidine bio¬ 
synthesis, also a requirement for cell division, 


O- O COOH 

I II I 
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O- O 5 - COOH 
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I I 

O' nh 2 
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making ATCase also a target for potential an¬ 
ticancer agents. 

N-Phosphonoacetyl-L-aspartate (PALA) (26) 

(Fig. 17.19) was initially designed as a transi¬ 
tion- state analog inhibitor of ATCase (105). It 
was found to have a K { value of 27 nM, a value 
that is considerably lower than the K M values 
of 27 (jM and 17 m M for carbamyl phosphate 
and L-aspartate, respectively (105).PALA was 
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Figure 17.19. Putative transition state, substrate, and inhibitors cf aspartate transcarbamylase. 
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found to inhibit cell growth in vivo (106) and, 
eventually, underwent clinical trials as an an¬ 
ticancer agent (107). 

PALA provides an example of the difficul¬ 
ties in distinguishing between a multisub¬ 
strate analog and a transition-state analog. As 
shown in Fig. 17.19, in effect PALA (26) com¬ 
bines two fragments, an analog of carbamyl 
phosphate (27) and succinate (28). The tight 
binding of PALA also suggested it was a poten¬ 
tial transition-state analog. However, succi¬ 
nate has a Ki value of 90 juAf, and the product 
of the K { values of succinate and carbamyl 
phosphate is 24 nM, which is almost identical 
to the K t value of PALA (105) .As shown in Fig. 
17.19, the transition-state structure (29) for 
the ATCase-catalyzed reaction is tetrahedral. 
The pyrophosphate analog (30) was expected 
to provide a much better mimic of the transi¬ 
tion state, yet its value of 0.24 fiM was ten¬ 
fold higher than that of PALA (108). It is not 
clear why there is this discrepancy, but a re¬ 
cent X-ray structure of the ATCase-PALA 
complex identified several groups that are po¬ 
sitioned to bind to a tetrahedral transition 
state (109). Two of these, the side chain of 
Glnl37 and the backbone carbonyl of Pro266, 
were positioned to interact with the amino 
group of the putative transition state (29). 
However, these groups would not be expected 
to interact so well to the analogous oxygen 
atom of the pyrophosphate transition-state 


analog (30), perhaps leading to its weaker- 
than-expected binding. 

The statins are a group of cholesterol-low¬ 
ering agents that have become some of the 
largest selling drugs in the world. They lower 
serum cholesterol levels by competitively 
inhibiting 3-hydroxy-3-methylglutaryl-coen- 
zyme A (HMG-CoA) reductase, a key enzyme 
in cholesterol biosynthesis (Equation 17.39). 



+ 2NADPH 



(17.39) 


+ CoASH + 2NADP + 

Several statin inhibitors of HMG-CoA re¬ 
ductase are shown in Fig. 17.20. They consist 
of rigid, hydrophobic groups connected to an 
HMG-like group that, in inhibitors such as 
mevastatin (compactin) (31), simvastatin (9) 
(Fig. 17.1), and the dichlorophenol derivative 
(32) is present in the form of a lactone. In vivo , 
the lactone is converted to the free acid, as 
shown in Fig. 17.20 for mevastatin (33). More 
recently developed statins, such as fluvastatin 
(34) and atorvastatin (35), are prepared as the 
free acids. These inhibitors have K { values in 
the low nanomolar range ( 110 ), significantly 
lower than the K M value of the substrate 
HMG-CoA, which is in the micromolar range 
(110,111).Given that these inhibitors did not 
appear to be transition-state analogs, Naka¬ 
mura and Abeles (112,113) conducted a num¬ 
ber of experiments to determine the basis 
of the enhanced affinity of, in particular, (31) 
and (32). 

Both mevastatin (31)and (32) were found 
to bind to the hydroxymethylglutarate portion 
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Figure 17.20. Statin inhibitors of HMGCoA-reductase. 


of the active site, but not the NADPH region, 
whereas only (31) bound to the coenzyme A 
portion. D,L-Mevalonate and d,l-3, 5-hydroxy- 
valerate, used as analogs of the upper portion 
of the statins, were both poor inhibitors, with 
K { values in the millimolar range; however, 
analogs of the hydrophobic decalin region cf 
mevastatin showed no inhibitory effect (112). 
Given that the K { value for mevastatin is al¬ 
most eight orders of magnitude lower than 
that of D,L-3,5-hydroxyvalerate, it is clear that 
the hydrophobic lower portion (and its cova¬ 
lent link) must play a significant role in the 
binding of (31)and, by implication, the bind¬ 
ing of all the statins. Presumably, the upper 
portion of the inhibitor is necessary for speci¬ 
ficity and the hydrophobic region for binding 
affinity. The hydrophobic region must be rel¬ 
atively nonspecific because a variety of hydro- 
phobic groups (Fig. 17.20) are accepted. In 


some cases (e.g., mevastatin), the hydrophobic 
group overlaps the CoASH site and in others, 
such as the dichlorophenol group of (32), it 
does not (112). The structure of the statin is 
analogous to that of a bisubstrate inhibitor, in 
that there is linked binding to two distinct 
binding sites on the enzyme, leading to greatly 
enhanced inhibition of the enzyme. For mev¬ 
astatin, the entropic advantage provided by 
linking the mevalonate and decalin portions 
together is estimated to be approximately 5 X 
10 4 M (113). This is quite a reasonable en¬ 
hancement, given that the theoretical maxi¬ 
mum is 10 8 M (57), and it has been suggested 
that such a "hydrophobic anchor" is responsi¬ 
ble for the enhanced binding of some inhibi¬ 
tors of alcohol dehydrogenase and adenosine 
deaminase (113). 

Although this explanation appeared quite 
reasonable, it was thrown into doubt when X- 
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ray structures of HMG-CoA reductase com- 
plexed with both substrates and products 
were obtained (114, 115). These structures 
showed that, if the statins bound so the HMG- 
like groups bound the HMG-binding pocket of 
the active site, the bulky hydrophobic groups 
of the statins would clash with the residues 
lining the narrow pocket into which part of the 
coenzyme A bound (115). However, recently, 
Istvan and Deisenhofer have obtained X-ray 
structures of HMG-CoA reductase bound to 
six individual statins, including (9), (31), (34), 
and (35) (116). This study showed the sub¬ 
strate-binding pocket rearranges to accommo¬ 
date the statins, that the statins do bind to the 
HMG-binding region, that a shallow hydro- 
phobic groove now accommodates the hydro- 
phobic groups, and that none of the NADP(H)- 
binding pocket is occupied (116).In toto, the 
structural studies supported all interpreta¬ 
tions made some 15 years earlier based on ki¬ 
netic studies, and provided definitive evidence 
for a hydrophobic anchor enhancing the bind¬ 
ing of the mevalonate portion of the statins. 

The evolution of the angiotensin convert¬ 
ing enzyme (ACE) inhibitors is an illuminat¬ 
ing story in the development of enzyme inhib¬ 
itors as therapeutic agents. As shown in 
Equation 17.40, ACE catalyzes the conversion 
of angiotensin I to angiotensin II. 

Angiotensin II, itself a potent hypertensive 
agent, also stimulates the release of a second 
hypertensive agent, aldosterone. In addition, 
ACE catalyzes the cleavage of the nonapeptide 
vasodilating agent, bradykinin (not shown). 
Therefore an ACE inhibitor was seen to have 
the potential to limit three hypertensive ac¬ 
tions. This premise was validated by in vivo 
results with teprotide, a peptide inhibitor of 


ACE, which had been isolated from a South 
American pit viper (117). 

At that time the structure of ACE was un¬ 
known, although it had been identified as a 
zinc metalloprotease. It was surmised that its 
mechanism and active site may resemble that 
of another metalloprotease, carboxypeptidase 
A, whose X-ray structure was known. CR)-2- 
Benzylsuccinic acid (36) (Fig. 17.21) had been 
identified as a potent inhibitor of carboxypep¬ 
tidase A, and it was suggested that (36)resem- 
bled the collected products of the hydrolysis 
reaction (Fig. 17.21). In other words, (36) was 
a biproduct analog and, not unexpectedly, it 
was found to bind with an affinity resembling 
the combined affinities of the two products 
(118). Carboxypeptidase A appeared to have 
three main interactions with (36). Two sub¬ 
strate-binding sites bound the phenyl group 
and one carboxylate, and the Zn 2 + ion, usually 
coordinated to the carbonyl of the amide bond 
being cleaved, was now bound to the second 
carboxylate. Combining those suggestions 
with studies with viper venom peptides, indi¬ 
cating that a C-terminal proline was effective 
in inhibiting ACE, a number of carboxyal- 
kanoylproline derivatives were tested as ACE 
inhibitors (119). Of these, the succinyl-L-pro- 
line derivative (37) was found to be the most 
effective, with an IC 50 value of 330 \jM. Given 
that one carboxylate bound to the Zn 2 + ion, a 
better zinc ligand, a thiol group, was substi¬ 
tuted for this carboxylate, resulting in (38) 
with the IC„ value now reduced to 0.2 yM. 
Finally, after taking into account the differ¬ 
ences between the active sites of ACE and car¬ 
boxypeptidase A, captopril (39) was prepared. 
Captopril was found to be a competitive inhib¬ 
itor of ACE, with a value of 1.7 nM, and was 
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the first ACE inhibitor to be marketed. It was 
not long before attempts were made to make 
capropril more productlike, with the resultant 
development of enalaprilat (i.e. enalaprilat) 
(40). Enalaprilat was found to be a slow-tight- 
binding inhibitor (Section 2.4) of ACE with a 
K x value below 1 nM, (120), but it was poorly 
absorbed orally. However, the ethyl ester 
(enalapril), (41) acted as a prodrug, had good 
oral activity, and was marketed. Note that it is 


also possible that enalaprilat acts as a transi¬ 
tion-state analog (Section 2.5.3), thereby ac¬ 
counting for its performing as a slow-tight- 
binding inhibitor (121). Following enalapril, 
many more ACE inhibitors have been devel¬ 
oped mainly aimed at increasing oral bioavail¬ 
ability, removing side effects, or improving 
metabolism.These include ramipril (42), the 
ester prodrug of ramiprilat (43), with 10 times 
better bioavailability than that of enalapril. 
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Ramaprilat was also shown to be a slow-tight- 
binding inhibitor of ACE, operating by mech¬ 
anism B, with Ki* (Equation 17.30) of 7 p M 
(122). A more detailed discussion of the devel¬ 
opment of the ACE inhibitors is available 
( 121 ). 


2.5.3 Transition-State Analogs. As a chemi¬ 
cal reaction proceeds from substrates to prod¬ 
ucts, it will pass through one or more transi¬ 
tion states. The energy barrier imposed by the 
highest energy transition state controls the 
overall rate of the reaction. Enzymes bring 
about rate enhancements of 10 lo -10 15 (123) 
by lowering this energy barrier. They do this 
by having a greater affinity to the structure of 
the transition state than to the structures of 
either substrates or products. Although an en¬ 
zyme may have good affinity for its substrate, 
as evidenced by a low dissociation constant 
( K s , Equation 17.411, for the Michaelis (E • S) 
complex, the enzyme can further stabilize the 
inherently unstable transition state, for exam¬ 
ple, by forming extra electrostatic or hydrogen 
bonds, by providing more effective hydropho¬ 
bic interactions, or by using structural re¬ 
arrangements to exclude solvent, thereby 
strengthening existing electrostatic contacts. 


k n * + 

E + S ^==^E + S* 

I 

*s 


k N 


E + P 


i 


k t 


' t 

i 

Xb* 



(17.41) 


E* S 


-E-S t ^=^E-P 


K t = K e x __ k E 
K s K N * k N 


Simple transition-state theory states that 
the rate of an enzyme-catalyzed reaction is 
correlated with the rate of a noncatalyzed re¬ 
action by the same factor as the affinity of an 
enzyme for the transition state to the affinity 
of an enzyme for a substrate (Equation 17.41) 
(99). 

Therefore, the magnitude of enzymatic 
catalysis (& E /£ N ) is related to the enhanced 
binding of the transition state to the enzyme 
(if T /j£ s ). Compounds that can take advantage 
of this enhanced binding to the transition 
state can prove to be potent and selective en¬ 


zyme inhibitors. Such compounds, referred to 
as transition-state analogs, can theoretically 
have ratios of the binding constants of inhibi¬ 
tor to substrate {KJKq) on the order of 10 -8 to 
10 -14 . In addition, transition-state analogs 
may have the further advantage of reduced 
molecularity, as outlined earlier (Section 
2.5.2) for multisubstrate analog inhibitors. 
Several reviews on the theory and general as¬ 
pects of transition-state analog inhibitors are 
available and are recommended for a more 
complete understanding of this topic (37, 96, 
99,100,124,125). 

The design of a good transition-state mimic 
is quite challenging. It requires, at the least, 
sufficient knowledge of the mechanism of the 
target enzyme to predict transition-state 
structure(s). This is why transition-state ana¬ 
logs are sometimes (but not in this review) 
referred to as mechanism-based inhibitors. A 
detailed knowledge of the true energy profile, 
including details such as the existence of dis¬ 
tinct chemical steps, high-energy intermedi¬ 
ates, and their associated transition states, is 
also useful (126). Further, by definition, the 
transition state is unstable, often highly 
charged, and possesses partially broken/ 
formed covalent bonds. Designing a stable 
compound that will closely mimic a transition 
state is impossible. However, the Hammond 
postulate states that the transition state be¬ 
tween a reactant and a high-energy reaction 
intermediate will resemble the intermediate 
rather than the reactant. It is possible to de¬ 
sign/synthesize an analog of a high-energy in¬ 
termediate. Indeed, the majority of so-called 
transition-state analogs are actually analogs 
of high-energy reaction intermediates. Al¬ 
though a clear distinction exists, the design 
process is, for all practical purposes, the same. 

It should also be noted that an enzyme is 
designed to initially recognize the features of 
its substrates. Often substrate binding brings 
about a conformational change in the enzyme 
that will then maximize the'attractive forces 
between the enzyme and transition state. The 
transition-state analog may not possess those 
features of the substrate that facilitate rapid 
binding, even though its affinity for the en¬ 
zyme is extremely high. Although some tran¬ 
sition-state analogs bind rapidly to enzymes, 
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Figure 17.22. (a) Thermolysin-catalyzed hydrolysis of peptide analogs showing putative transition 
state, (b) phosphonamidate peptide analog, and (c) fluoroalkane peptide analog. 


others bind slowly and show the properties of 
the slow-binding inhibitors described earlier 
in Section 2.4. 

Slow binding, tight binding, or structural 
similarity to the assumed transition-state 
structure are not, in themselves, sufficient cri¬ 
teria to establish that an inhibitor is a true 
transition-state analog (127). Methotrexate 
(7), for example, is an extremely high-affinity 
(Ki = 58 pAf), slow-binding inhibitor of dihy¬ 
drofolate reductase (128). On the surface, it 
would appear that methotrexate could be clas¬ 
sified as a transition-state analog. However, 
crystallographic studies have shown that 
methotrexate binds with its pterin ring in the 
opposite orientation to that of the substrate, 
dihydrofolate (129, 130). To distinguish be¬ 
tween a high-affinity, ground-state analog and 
a putative transition-state analog requires a 
careful appraisal. There is a fundamental dif¬ 
ference between the entropy change of a uni- 
molecular enzymatic reaction and that of a 
multimolecular solution reaction (131). In ad¬ 
dition, the appropriate rate constant for the 
nonenzymatic reaction is often either not 
available or hard to obtain (132, 133). These 
factors combine to make it difficult to evaluate 


quantitatively the correlation between the en¬ 
hanced rates of enzymatic reactions and the 
tight binding of transition-state analogs. 

In an attempt to develop stringent criteria 
for the distinction between transition- state 
and ground-state analogs, Bartlett and Mar¬ 
lowe (127) overcame some of these inherent 
difficulties by comparing the binding affinities 
of a series of substrate analogs with those of 
the corresponding transition-state analogs. 
One consequence of Equation 17.41 is that, if 
there is a change in structure of a substrate 
that alters k CSLt /K M without altering the non¬ 
enzymatic rate of reaction, then an analogous 
structural change in the transition-state 
mimic should bring about a similar change in 
Put simply, there should be a linear rela¬ 
tionship between the values of K x for the tran¬ 
sition-state analog and k ca JK M for the corre¬ 
sponding substrate. Bartlett and Marlow 
(127) designed a series of dipeptide analog 
substrates (44) (Fig. 17.22) for thermolysin in 
which the structural variation was remote 
from the reactive center and therefore un¬ 
likely to affect the nonenzymatic reaction 
rate. The reaction catalyzed by thermolysin is 
proposed to proceed by the tetrahedral transi- 
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Table 17.6 Correlation of K i Values for Inhibitors of Thermolysin with K M and K M ik CBt Values 
for the Corresponding Substrates 3 
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0.32 

2.6 

7.0 

R = L-Phe 

0.19 

2.4 

20 


"Data are from Ref. 127 . 


tion state (45), and with their long P —0 
bonds, the phosphonamidate compounds (46, 
Table 17.6) were expected to a d as transition- 
state analog inhibitors. It was found that the 
Ki values for the putative transition-state an¬ 
alog inhibitors correlated linearly with the 
K M /k c a t values of the corresponding sub¬ 
strates, although no correlation was found be¬ 
tween Ki and K M (Table 17.6). The fact that 
substrate binding (K m ) was relatively unaf¬ 
fected by a change at a remote site was not 
unexpected, but the observation that the bind¬ 
ing of the phosphonamide inhibitors was 
greatly affected suggests that these inhibitors 
were, indeed, transition-state rather than 
ground-state analogs. Conversely, the K { val¬ 
ues for a series of fluoroalkeneisosteres of the 
same substrates (47) (Fig. 17.22) correlated 
strongly with K M but notK M /k cat (Table 17.6), 
indicating that the latter inhibitors were 
ground-state analogs (134).This approach has 
also been used to confirm that phosphonic acid 
peptides were transition-state analog inhibi¬ 
tors of pepsin (135). 

One of the most popular targets for design 
of transition-state analogsis adenosine deami¬ 


nase. Inhibitors of this enzyme have been used 
as immunosuppressants and are also potential 
antitumor agents, whereas lack of adenosine 
deaminase results in severe combined immu¬ 
nodeficiency disease (SCID). 

Adenosine deaminase (ADA), which cata¬ 
lyzes the conversion of adenosine to inosine 
(Equation 17.42), is an extremely proficient 
enzyme, providing a rate enhancement of 
more than 12 orders of magnitude (123). The 
enzyme-catalyzed reaction is thought to pass 
through an unstable hydrated intermediate 
(48) (Fig. 17.23), with a K T (Equation 17.41) 
in the region of 10 -17 M (123). Clearly, even a 
crude analog of (48) would have the potential 
to be an extremely powerful inhibitor of ADA 

The structures of several inhibitors of ADA 
are shown in Fig. 17.23. Of these, the antibi¬ 
otics coformycin (49) and (R)-deoxycoformy- 
cin (pentostatin, 50) were found to be potent 
ADA inhibitors, with K t values of 1 X 10 _11 M 
(136) and 2.5 X 10~ 12 M (137), respectively. 
The K M for adenosine is around 30 j uM (138, 
139), whereas the K { of the product, inosine, is 
10~ 4 M. Thus, the two antibiotics show at 
least 10 6 -fold greater affinity for ADA than for 
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Figure 17.23. Reactionmtermediateand inhibitors of the reaction catalyzed by adenosine deaminase. 



Ribose 


adenosine 



Ribose 


inosine 

the substrate, suggesting that they are acting 
as transition-state analogs. By contrast, (S)- 
deoxycoformycin (51) and 8-ketodeoxycofor- 
mycin (52), with K { values of 33 and 40 yM, 


respectively, are clearly ground-state analogs. 
The differences in binding affinities of (. R)~ 
and (S)-deoxycoformycin translate to a differ¬ 
ence in binding energy of almost 10 kcal/mol 
and provide an estimate of the energy that can 
be applied to substrate distortion in formation 
of the transition-state complex (139). 

Purine ribonucleoside (53) was initially 
thought to bind to ADA as a ground-state an¬ 
alog, with an apparent of 3 yM (138,140). 
However, it was observed that the structure of 
the ligand and the enzyme were perturbed 
when purine riboside bound to ADA. 13 C- 
NMR spectroscopy showed that the ADA- 
bound purine riboside was sp 3 hybridized at 
C-6 (141).TheNMRandUV spectra suggested 
that it was the hydrated form of purine ribo¬ 
side (54) that was binding to ADA and, using 
the unfavorable equilibrium constant for hy¬ 
dration in solution (1.1 X 10 _7 M, Fig. 17.231, 
a true value of 3 X 10" 13 M could be calcu¬ 
lated for (54) (142). Given the low concentra¬ 
tion of the free hydrate in solution, and the 
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Figure 17.24. (a) Putative transition state for the dihydroorotase reaction, and (b) boronic acid 
transition-state analogs. 


rapid onset of inhibition, it appears that pu¬ 
rine riboside (53) itself initially binds and is 
then rapidly converted to (54) in the active site 
(141). This result, along with the high affini¬ 
ties of (/2)-coformycin and (R )- deoxyeoformy- 
cin, argues that the reaction proceeds by a ste¬ 
reospecific, direct attack of water rather than 
the double-displacement mechanism that also 
had been proposed (141). More recently, an 
X-ray study on adenosine deaminase, which 
had been crystallized in the presence of purine 
ribonucleoside, confirmed that it was the hy¬ 
drated species of purine ribonucleoside that 
was present in the active site (143). Further, a 
triad of a zinc atom, a histidine residue, and an 
aspartic acid residue ensured that the binding 
was stereospecific, with the 6 R isomer (55)be- 
ing favored. 

The adenosine deaminase story, in many 
ways, provides a perfect example of the gen¬ 
eral principles of enzymatic catalysis and the 
utility of enzyme inhibitors. ADA is an ex¬ 
tremely efficient catalyst, producing a rate en¬ 
hancement of 12 orders of magnitude. 6/2-Hy¬ 
droxy-1,6-dihydropurine riboside (55) has an 
affinity for ADA about 8 orders of magnitude 
greater than that for substrates or products; 
that is, it expresses a substantial fraction of 
the free energy of binding that separates the 
transition state from the ground state in an 
enzymatic reaction. Evidence of the extraordi¬ 
nary ability of an enzyme to discriminate be¬ 
tween stereoisomers is provided by the 10 7 - 
fold difference in binding affinities of the 
8/2-OH (50) and 8S-OH (51)stereoisomers of 
2'-deoxycoformycin. Inhibitors were used to 
differentiate among several potential reaction 
mechanisms for ADA and, finally, an ADA in¬ 


hibitor (pentostatin) has proved to be of ther¬ 
apeutic benefit. 

Inhibitors of pyrimidine and purine biosyn¬ 
thesis are used as antineoplastic agents. As a 
consequence, dihydroorotase, which catalyzes 
the third step of de novo pyrimidine biosynthe¬ 
sis, the conversion of carbamyl aspartate to 
dihydroorotate (Equation 17.43), is a target 
for therapeutic intervention. 


0 


HOC, 

h 2 iJj 

CT^N 

H 


dihydroorotase 


co 2 - 


carbamyl aspartate 

(17.43) 



dihydroorotate 


The reaction is thought to proceed through 
the tetrahedral-activated complex (56) (Fig. 
17.24), which is a highly charged, unstable 
sp 3 carbon species (144,145). At around neu¬ 
tral pH, compound (57), a boron-containing 
analog of carbamyl aspartate, rearranges to 
the stable, tetrahedral boronic acid derivative 
(58). The affinity of (58) for dihydrooro¬ 
tase = 5 j uM) was found to be 10-fold 
greater than that of the carbamyl aspartate 
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Km = 50 fiMy indicating that (58) was proba¬ 
bly acting as a transition-state analog (145). 
Tetrahedral boronic acid structures are sta¬ 
ble, unlike the analogous sp 3 carbon species, 
and boronic acid derivatives of substrate pep¬ 
tides have proved to be quite potent inhibitors 
of a variety of proteases, particularly serine 
proteases (146). 

Chorismate mutase catalyzes the conver¬ 
sion of chorismate to prephenate (Equation 
17.44). This reaction is unusual, in that it is 
the only pericyclic [3,3] sigmatropic rear¬ 
rangement (Claisen rearrangement) that is 
catalyzed by an enzyme. 



chorismate 


O (17.44) 



OH 

prephenate 

Although chorismate mutase does provide 
a rate enhancement of 2 X 10 6 (147), this uni- 
molecular reaction readily occurs without en¬ 
zyme, under mild conditions. The reaction 
was expected to pass through a chairlike tran¬ 
sition state (59)(Fig. 17.25) but early molecu¬ 
lar orbital calculations indicated that the boat¬ 
like transition state (60) was not out of 
the question (147). In an attempt to define 
the transition-state structure, several com¬ 
pounds, each designed to mimic a putative 
transition state, were synthesized and tested 
as chorismate mutase inhibitors (147). The 
enzyme was found to be inhibited by the exo- 
carboxy nonane (61), with an apparent 
value of 3.9 X 10 -4 M. Conversely, the endo- 
carboxy nonane (62) did not inhibit the en¬ 
zyme. The apparent K v value of the adaman- 


tane derivative (63), in which the chair-chair 
conformation is fixed, was about the same as 
that of (61).Taken together, this implied that 
the reaction proceeded through a chairlike 
transition state (147). 

This approach was later refined by Bartlett 
and Johnson, who suggested that IC 50 /i? M ra¬ 
tios of 7 for compound (61) and 12 for com¬ 
pound (63) indicated that these inhibitors 
were not particularly good transition-state an¬ 
alogs (148).In an attempt to improve potency, 
and to further define the stereochemistry of 
the transition state, they synthesized several 
compounds including the exo- and endo- car- 
boxy unsaturated oxabicylic ethers, (64) and 
(65), respectively (148). The exo-compound 
(64) was not significantly better than its satu¬ 
rated carbocyclic analog (61), but the endo- 
derivative (65) bound chorismate mutase 
some 100 -fold more tightly than did chorismic 
acid under the same conditions, with a K { 
value of 120 nM (148,149). Later, monoclonal 
antibodies elicited against (65) were found to 
be effective catalysts for the conversion of cho¬ 
rismate to prephenate, with rate enhance¬ 
ments of 200 -fold in one case (150) and 10,000- 
fold in another (151). In both instances it was 
suggested that the rate enhancement was at¬ 
tributable to increased binding of the transi¬ 
tion state by the antibody (150,151). 

X-ray structures are now available of the 
complexes of (65) with two chorismate mutase 
enzymes (152, 153), as well as with the less- 
efficient catalytic antibody (154). Although 
each active site was found to employ a differ¬ 
ent constellation of interactions with (65), the 
dissociation constants for the binding of (65) 
to the three proteins were strikingly similar, 
ranging from 0.6 to 3 pM (153,154). However, 
the micromolar affinity of (65) for both en¬ 
zyme and antibody is considerably weaker 
than might be expected for a good mimic of the 
transition state, and the antibody is not a par¬ 
ticularly efficient catalyst. Wiest and Houk 
(155) have calculated that the bond lengths for 
the breaking and forming bonds in the transi¬ 
tion state are considerably longer than those 
for (65), and the neutral inhibitor does not 
mimic the charge separation that builds up in 
the transition state. Although the two en¬ 
zyme-active sites have evolved to complement 
the larger, polarized transition state, the anti- 
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Figure 17.25. (a) Putative 
transition states for the choris- 
mate-prephenate rearrange- 
mentand (b) structuresof poten¬ 
tial transition-stateanalogs. 



body has no residues positioned to stabilize 
the polar-transition state. Further, the active 
site is smaller and makes more van der Waals 
contacts with the inhibitor, again features 
likely to impede catalysis. These features pro¬ 
vide evidence for the innate difficulties associ¬ 
ated with designingboth good transition-state 
analogs and efficient catalytic antibodies. 

3 RATIONAL DESIGN OF COVALENTLY 
BINDING ENZYME INHIBITORS 

For the purposes of this chapter we have di¬ 
vided covalently binding enzyme inhibitors 
into categories according to Table 17.4. 
Pseudoirreversible inhibitors are discussed 
separately and the others are, in order of in¬ 
creasing specificity, chemical modifiers, affin¬ 
ity labels, and mechanism-based inhibitors. 
The targets for these inhibitors are the chem¬ 
ically reactive groups found within the en¬ 
zyme's active site. These groups, in the major¬ 
ity of cases, are nucleophiles such as the —OH 
groups of serine, threonine, and tyrosine, the 


—SH group of cysteine, and the — COOH 
groups of aspartic and glutamic acid residues. 
Other nucleophilic groups include the e-amino 
group of lysine and the imidazole ring of histi¬ 
dine. In some cases the —NH 2 and — COOH 
groups of the enzyme's N- and C-termini, re¬ 
spectively, are also active-site nucleophiles, 
whereas enzymic cofactors may also provide 
targets for covalently binding inhibitors. Argi¬ 
nine is the only common amino acid that has 
an electrophilic side chain and it also can be 
modified with suitable nucleophilic agents. 
Kyte has recently provided an excellent over¬ 
view of the general area of active-site modifi¬ 
cation and labeling (156). 

The first group of covalently binding en¬ 
zyme inhibitors, the chemical modifiers, are 
small organic molecules, generally electro¬ 
philes, that are used to modify the enzyme's 
side chains in such a way as to produce a stable 
covalent bond. These are often used to study 
enzyme inactivation and to identify residues 
potentially involved in binding and catalysis. 
Some of the commonly used reagents are 
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Table 17.7 Commonly Used Reagents for Chemical Modification 


Residue 

Targeted 

Reagent 

Other Residues Labeled 

Lysine 

Acetic anhydride 

Isothiocyanates 

Trinitrobenzenesulfonate (TNBS) 
Cyanate 

These reagents can also react with 
the N-terminal amino group 

Histidine 

Diethylpyrocarbonate (DEPC) 

DEPC should be used at neutral 
pH to minimize reaction with 
lysines, cysteines, and tyrosines 

Cysteine 

Iodoacetamide, iodoacetate 

p -Hydroxymercuribenzoate 

Methyl methanethiosulfonate 

Ellman’s reagent (DTNB) 
JV-Ethylmaleimide 

Iodoacetamide has the potential to 
modify histidines and lysines 

Arginine 

Phenylglyoxal 

Butanedione 

Phenylglyoxal can react with lysine 
Butanedione should be used in the 
dark to prevent reaction with 
tryptophans, histidines, and 
tyrosines 

Tyrosine 

Tryptophan 

Serine 

Aspartic Add 
Glutamic Add 

T etranitromethane 

Chloramine T 

IV-Bromosuccinimide 

2-Hydroxy-5-nitrobenzyl bromide 
Diisopropylfluorophosphate 
Halomethyl ketones 

Carbodiimides 

Trimethyl oxonium fluoroborate 
Isoxazolium salts 

Chloramine T also modifies 
histidines and methionines 


listed in Table 17.7. These compounds are 
chemically reactive and may lead to the modi¬ 
fication of both catalytic and nonessential res¬ 
idues. As a consequence, experimental design 
(such as choice of reagent and reaction condi¬ 
tions, use of substrate protection, etc.) is of 
utmost importance in carrying out and inter¬ 
preting chemical modification studies. Al¬ 
though inhibitors of this type are not the 
prime focus of this chapter (and are not dis¬ 
cussed further), it should be noted that most of 
the kinetic equations that apply to affinity la¬ 
bels also apply to chemical modifiers, and 
there are a number of texts available that 
cover this topic (40,157,158). 

Although the organic modifiers are usually 
not specific for a given enzyme, the second 
group, the affinity labels, have a degree of 
specificity built in. Sometimes described as ac¬ 
tive-site directed, irreversible inhibitors, af¬ 
finity labels are usually substrate or product 
analogs that contain an additional chemically 
reactive moiety. They first bind to the en¬ 


zyme's active site in a noncovalent fashion, 
like rapid reversible inhibitors. However, 
upon formation of the enzyme-inhibitor com¬ 
plex (Ed), they react by various mechanisms 
with one or more amino acid residues in close 
proximity in the enzyme's active site. This re¬ 
sults in covalent bond formation between the 
enzyme and the inhibitor (E-I) (Equation 
17.45). 

k l k 2 

E + I E • I —* E-I (17.45) 

*-i 

Usually the inhibitor contains an electro¬ 
philic moiety that labels amino acids contain¬ 
ing nucleophilic groups. However, in some 
cases, a nucleophilic species may be formed, 
which can react either with arginine or with 
any tightly bound organic or inorganic low 
molecular weight cofactors possessing electro¬ 
philic sites. Unlike the mechanism-based in- 






756 


Approaches to the Rational Design of Enzyme Inhibitors 


hibitors described below, affinity labels do not 
require activation by catalysis at the enzyme's 
active site. Most often, the covalent bond for¬ 
mation occurs by an S N 2 alkylation-type 
mechanism, Schiff base formation, or acyla¬ 
tion (156,159). 

Affinity labels, some of which have become 
successful therapeutic agents, are often used 
to identify catalytically important residues. In 
some cases, by examining the pH dependency 
of the rate of inactivation, it is possible to de¬ 
termine the p K & of the labeled residue. Again, 
there are a number of excellent reviews on this 
topic (160-163), including a complete volume 
in the Methods in Enzymology series (159). 

Recently, Pratt (164) and Krantz (165) 
have suggested that any inactivator that uti¬ 
lizes an enzyme's mechanism, in the broadest 
sense, should be described as a mechanism- 
based inhibitor. Although this is not unrea¬ 
sonable, we have, for the purposes of this 
chapter, adopted the more narrow view of Sil¬ 
verman (166). In this view, mechanism-based 
inhibitors (also called suicide substrates, Tro¬ 
jan horse inactivators, enzyme-induced inacti¬ 
vators, k ^ inhibitors, and latent inactivators) 
are described as unreactive compounds, the 
structure of which usually resembles that of a 
substrate or product of the target enzyme, and 
that undergo a catalytic transformation by the 
enzyme to species that, before release from the 
active site, inactivate the enzyme. Thus, these 
compounds usually contain a latent, reactive 
functional group that gets activated during 
the normal catalysis of the enzyme. Upon for¬ 
mation of the initial reversible enzyme-inhib¬ 
itor complex E . I, the enzyme starts its nor¬ 
mal catalytic cycle, leading in a usually rate¬ 
determining step to the formation of a highly 
reactive species, E . I' (Equation 17.46). 


E +1 ^ E • I ► E • I' (17.46) 

k ~ 1 

E + P 

The reactive species can either react with 
one of the enzyme active-site amino acid resi¬ 
dues, to form a covalent bond between the en¬ 
zyme and the inhibitor (E-I"), or be released 
into the medium to form product (P) and free 


active enzyme (E). In some instances the reac¬ 
tion may occur between the reactive species 
and the enzyme's cofactor, again resulting in 
inactivation of the enzyme. 

It should also be noted that the activation 
of a mechanism-based inhibitor by its target 
enzyme is, formally, an example of metabolic 
activation. However, there is a clear distinc¬ 
tion between the activation of a mechanism- 
based inhibitor described above and the meta¬ 
bolic activation of a prodrug. In the latter case, 
an inactive precursor is metabolized in the 
body (either chemically or enzymatically) to 
metabolites that possess the desired activity. 
For example, Acyclovir (3a) must be metabol- 
ically converted to the triphosphate (3b) and 
released into the medium before it will inhibit 
viral DNA polymerase. Further discussion on 
prodrugs may be found in volume 2, chapter 
14. 

3.1 Evaluation of the Mechanism of 
Inactivation of Covalently Binding 
Enzyme Inhibitors 

The inherent complexity of the inactivation 
mechanisms of covalently binding enzyme in¬ 
hibitors makes it necessary to evaluate their 
proposed modes of action carefully. An over¬ 
view of the criteria for the study of irreversible 
inhibitors is provided below. 

3.1.1 Criteria for the Study of Affinity La¬ 
bels. The evaluation of affinity labels is based 
on the fulfillment of the following criteria: 

1. Irreversible inactivation. Inactivation by 
affinity labels leads to irreversible covalent 
bond formation between the enzyme and 
the inhibitor. Unlike the complex between 
and enzyme and a rapid, reversible inhibi¬ 
tor, the covalent enzyme-inhibitor complex 
is no longer in equilibrium with free en¬ 
zyme and inhibitor. Therefore, exhaustive 
dialysis or gel filtration of the covalent en¬ 
zyme-inhibitor complex cannot lead to the 
recovery of free, active enzyme. However, 
such experiments do not allow distinction 
among tight-binding, noncovalent inhibi¬ 
tors, affinity labels, and mechanism-based 
inactivators. 

2. Time- and concentration-dependent inacti¬ 
vation showing saturation kinetics. The 
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Figure 17.26. Pseudo first-order inactivation ki¬ 
netics of an active-site directed irreversible inhibi¬ 
tor. 
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K, 

Figure 17.27. Kitz and Wilson plot. 


scheme described by Equation 17.45 is 
analogous to that describedfor a simple en¬ 
zyme-catalyzed reaction (Equation 17.9). 
This scheme can be described by Equation 
17.47, which is analogous to the Michaelis- 
Menten equation (Equation 17.10). 


dm 

dt 


* 2 [I] , ^ *i 

vi where K ' = kz 

(17.47) 


According to Equation 17.47, an affinity la¬ 
bel should exhibit time- and concentration-de- 
pendent inactivation. The rate of inactivation 
is proportional to low concentrations of inhib¬ 
itor, whereas at high inhibitor concentrations 
saturation occurs and no further increase in 
the rate of inactivation is observed. A typical 
pseudo first-order plot of log enzyme activity 
vs. time is illustrated in Fig. 17.26. In some 
cases nonlinear plots may be obtained, partic¬ 
ularly for mechanism-based inhibitors (166, 
167). 

3. Saturation kinetics and determination cf 
Kj and k inact . To distinguish the rate and 
binding constants of rapid reversible inhib¬ 
itors (^^ and K if respectively) from the 
rate and binding constants of irreversible 
inhibitors, the terms k inact an( J have 
been used. To determine k inELCt and K It sat¬ 
uration kinetics must be obeyed. Satura¬ 
tion is reached when all of the free enzyme 
is converted to the reversible enzyme-in¬ 
hibitor complex E . I. At that point, the rate 
of inactivation is independent of kjk_ 
(Equation 17.45), assuming that the rate of 


formation of the initial reversible enzyme- 
inhibitor complex (fe x ) is significantly 
greater than the rate of formation of the 
covalent enzyme-inhibitor complex (k 2 ). 
Consequently, a higher concentration of in¬ 
hibitor will not lead to an increased rate of 
inactivation. The K T value represents the 
concentration of inhibitor leading to the 
half-maximumrate of inactivation (in anal¬ 
ogy to a K m value), and & inact is the maxi¬ 
mum rate of inactivation at the point of 
saturation (in analogy to & eat ). To deter¬ 
mine the Kj and & inact values, the enzyme is 
incubated at various subsaturating concen¬ 
trations of the inhibitor, from which the 
half-life of inactivation at each inhibitor 
concentration is deduced. Using Kitz and 
Wilson plots (168), the half-life of inactiva¬ 
tion at each inhibitor concentration is plot¬ 
ted against 1/[I]. A typical plot is illustrated 
in Fig. 17.27. The y-intercept represents 
the half-life of inhibition at infinite inhibi¬ 
tor concentration, with & inact equal to 
0.693 ft 1/2 - can be determined from the 
x-intercept, which is equal to -1 IK V If no 
saturation occurs with a tested inhibitor, 
the curve will intercept at the origin of the 
graph, implying that & inact is much faster 
than the formation of the initial reversible 
enzyme-mechanism-based inhibitor com¬ 
plex. If this is observed, one might use a 
lower temperature or a different pH to 
lower & inact . It should also be noted that, in 
general, if the affinity label is reacting with 
an ionizable group that is involved in catal¬ 
ysis, then the pH dependency of k innc ,JKj 
should mirror that of 
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Figure 17.28. Usinga substrate to protect an 
enzyme from inactivation by an active site-di¬ 
rected irreversible inhibitor. 



Time 


4. A bindingstoichiometry of 1:1 of inhibitor 
to the enzyme's active site. In general, 
complete inactivation of an enzyme re¬ 
quires the binding of one mole of inhibi¬ 
tor per mole of enzyme active site. Excep¬ 
tions can be certain multimeric enzymes 
that are inactivated by binding of only 
one-half mole of inhibitor per mole of 
enzyme subunit, a phenomenon called 
half-site reactivity. The stoichiometry of 
binding is usually determined by incubat¬ 
ing an excess of radiolabeled inhibitor 
with the enzyme to ensure complete irre¬ 
versible inactivation, followed by either 
exhaustive dialysis or gel filtration. The 
binding stoichiometry of the obtained en¬ 
zyme-inhibitor complex in the absence of 
free inhibitor is then examined for its 
radiolabel and protein content. More re¬ 
cently, developments in high-resolution 
mass spectrometry have allowed the 
determination of binding stoichiome¬ 
try without the need for radiolabeled 
inhibitor. 

5. Substrate protection. Ligands of the en¬ 
zyme, either substrates or reversible inhib¬ 
itors, should greatly decrease the rate of 
modification by the affinity label. Both af¬ 
finity labels and mechanism-based inhibi¬ 
tors should be active-site directed, thereby 
competing with the substrate for the same 
binding site on the enzyme. This can be 
tested by incubating the enzyme with in¬ 
creasing amounts of substrate at constant 
inhibitor concentrations. As the substrate 
concentration is increased, the rate of inac¬ 
tivation will become slower because, under 
initial velocity conditions, a portion of the 


enzyme is protected as the E • S complex. A 
typical plot of the log of enzyme concentra¬ 
tion vs. time at different substrate concen¬ 
trations is shown in Fig. 17.28. 

6 . Verification cf covalent bond formation. In 
many cases it can be difficult to differenti¬ 
ate between a covalently binding enzyme 
inhibitor and a very tight but nonco- 
valently binding inhibitor. Although 
strongly denaturating conditions may not 
lead to the release of tight, noncovalently 
bound inhibitors, the covalent linkage be¬ 
tween an enzyme and its inhibitor can 
sometimes be quite labile to nucleophiles 
and extremes of pH. A frequently used 
method to determine the covalently modi¬ 
fied amino acid residue of an enzyme's ac¬ 
tive site is peptide mapping. Enzyme-inhib¬ 
itor complexes, usually prepared from 
radioactive labeled inhibitor, are treated 
under mildly denaturing conditions with 
an appropriate protease. Subsequently, the 
peptide fragments obtained are usually re¬ 
solved by high-pressure liquid chromatog¬ 
raphy and isolated. Analysis of the labeled 
peptides can be accomplished by Edman 
degradation and/or mass spectrometry. A 
good description and example of this 
method can be found elsewhere (169). Al¬ 
ternatively, electrospray ionization mass 
spectrometry has been used as a tool to de¬ 
termine the accurate mass of the proteins 
and enzyme-inhibitor complexes. In a 
study by Knight et al. (170), this method 
was successfully used to distinguish be¬ 
tween covalent and noncovalent complexes 
because the latter did not survive the ex¬ 
perimental conditions. 
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3.1.2 Criteria for the Study of Mechanism- 
Based Inactivators. In addition to the require¬ 
ments described above for an affinity label, a 
mechanism-based inhibitor should also dem¬ 
onstrate the following: 

1. Occurrence of a catalytic step. The major 
difference between the mechanism of inac¬ 
tivation of mechanism-based inactivators 
vs. that of any other type of inhibitor is the 
obligate involvement of a catalytic step, 
that is, step 2 in Equation 17.46. Initially, 
the mechanism-based inhibitor binds re¬ 
versibly to form the E. I complex. The en¬ 
zyme then starts its normal catalytic cycle, 
resulting in the conversion of the inhibitor 
into a reactive species (I r ). If the reactive 
species is electrophilic,it may react with an 
active-site nucleophile, much like an affin¬ 
ity label. If the reactive species is nucleo¬ 
philic, it may react with an electrophilic 
species on the enzyme, probably an oxi¬ 
dized cofactor. Finally, a radical species 
may be generated that has the potential to 
react with an enzyme radical, or generate 
one by hydrogen atom abstraction. The ex¬ 
periments necessary to provide evidence 
for a catalytic step are obviously strongly 
dependent on the individual catalytic 
mechanism involved. The experiments 
may include spectrophotometric detection 
of oxidized or reduced cofactor, observing 
C-H bond cleavage by monitoring the re¬ 
lease of tritium, or the detection of some 
component of cleaved inhibitor (such as 
fluoride ion as in some examples shown 
below). 

2. No release cf the activated species before en¬ 
zyme inactivation. For a mechanism-based 
inactivator to retain its high specificity 
during inactivation, release of the reactive 
species from the active site must not be 
part of the normal mechanism of inactiva¬ 
tion. A time-dependent increase in the rate 
of inactivation points to the release of an 
activated species before inactivation. This 
increase in the rate of inactivation is 
brought about by the accumulation of free 
reactive species in solution. Inhibitors gen¬ 
erated in this manner have been termed 
metabolically activated affinity labels 


(171). In these cases, as with affinity labels, 
nonspecific covalent modification of resi¬ 
dues other than those located in the active 
site cannot be excluded. A second test for a 
metabolically activated affinity label is to 
add an additional aliquot of fresh enzyme 
to the incubation buffer. The fresh enzyme 
should be inactivated at a higher rate than 
that of the first equivalent of enzyme be¬ 
cause there is more reactive species present 
in solution. By contrast, the mechanism- 
based inhibitor should show no difference 
in rate until the concentration of inhibitor 
is depleted. It should also be noted that the 
observation of such rate increases necessi¬ 
tates that the reactive species is relatively 
stable and is not immediately quenched by 
the incubation buffer. 

Additional tests such as the addition of 
nucleophilic scavengers (e.g., thiols such as 
dithiothreitol or j3-mercaptoethanol) can 
provide further evidence for the presence of 
a free, reactive electrophilic species. The 
scavengers should quench all of the free re¬ 
active species, thereby protecting the en¬ 
zyme from inhibition. Unfortunately, this 
method cannot exclude the possibility that 
a nucleophilic thiol may even attack the 
bound reactive species at the active site of 
the enzyme (which would also give rise to 
protection from inactivation). However, 
the use of a bulky thiol, such as reduced 
glutathione, should limit that possibility. 
An alternative scenario occurs wherein the 
released reactive species returns and reacts 
faster with an active-site nucleophile than 
with the added thiol. Clearly this is a com¬ 
plex problem and, consequently,it is advis¬ 
able to use several different tests to avoid 
misleading conclusions. 

3. Partition ratio. The partition ratio is the 
ratio of product release to enzyme inactiva¬ 
tion and is a measure of the efficiency of the 
mechanism-based inhibitor. Formally, it 
refers to the ratio kjk s (Equation 17.46). 
The most efficient inactivators will have 
partition ratio of zero. In those cases, the¬ 
oretically, every enzymatically processed 
inhibitor molecule will result in the inacti¬ 
vation of a molecule of enzyme. Even 
though the partition ratio is independent of 
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Figure 17.29. Determination of the partition 
ratio. 


the initial concentration of inhibitor, it will 
depend on factors such as the rate of diffu¬ 
sion of the reactive species from the active 
site, its reactivity, and the proximity of the 
target for covalent bond formation. A num¬ 
ber of different methods have been used to 
determine the partition ratio. For example, 
if, under the experimental conditions, the 
rate of inactivation is relatively fast com¬ 
pared to the chemical stability of the en¬ 
zyme or the inhibitor, the partition ratio 
can be determined kinetically by titration 
of the enzyme activity. The titration mea¬ 
sures the number of inhibitor molecules re¬ 
quired to completely inactivate the en¬ 
zyme. In an experiment of this type, 
increasing amounts of inhibitor are added 
to a known, fixed amount of enzyme, and 
the reaction is allowed to go to completion. 
After gel filtration or dialysis, a plot of the 
amount of inhibitor per enzyme active site 
and the remaining enzyme activity is 
drawn (Fig. 17.29). The intercept with the 
x-axis represents the minimum number of 
equivalents of inhibitor necessary to inac¬ 
tivate the enzyme completely (turnover 
number). A turnover number of 6, such as 
that shown in Fig. 17.29, indicates that on 
average 5 equivalents of inactivator are 
converted to product and only every sixth 
equivalent of inhibitor leads to irreversible 
covalent bond formation (i.e., the partition 
ratio equals the turnover ratio minus 1). 
Unfortunately, there are a number of fac¬ 
tors associated with this method that may 
lead to misleading results (166). Another 
method for determining partition ratios is 


equilibrium dialysis of the enzyme with ra¬ 
diolabeled inactivator, followed by deter¬ 
mination of the amount of radiolabeled 
metabolites produced per radiolabeled en¬ 
zyme. Perhaps the simplest method is for 
cases where the rate of product formation 
(i.e., k cat = k 4 in Equation 17.46) can easily 
be measured. In this instance, both & cat and 
^inact are measured directly, with k c Jk innct 
being the partition ratio (166,167). 

A more detailed discussion of the re¬ 
quirements for mechanism-based inhibi¬ 
tion can be found in a recent review by Sil¬ 
verman (166). 

3.2 Affinity Labels 

Affinity labels are potentially good drugs, al¬ 
though the presence of a reactive functional 
group can make them somewhat nonselective 
and prone toward reaction with other proteins 
and metabolites. If the affinity label is highly 
selective toward its target enzyme and has a 
great affinity for the enzyme's active site, this 
drawback can be overcome kinetically. Once 
the inhibitor is bound, the unimolecular reac¬ 
tion between the inhibitor and an amino acid 
residue in close proximity is entropically quite 
favorable compared to a bimolecular reaction 
between two free molecules in solution. This 
proximity effect has resulted in rate enhance¬ 
ments as great as 10 s (172) and means that a 
reagent that is, in itself, only weakly active, 
may be highly reactive when it is reformulated 
as an affinity label. More in-depth discussion 
on this topic can be found elsewhere (39,173, 
174). 

The design of a potent affinity label re¬ 
quires the study of the initial requirements for 
the inhibitor to bind to the active site. Next, 
regions of bulk tolerance are determined that 
are useful for the introduction of a reactive 
functional group. In some cases, it might be 
advantageous to place the reactive group at 
the end of a spacer arm, particularly if no nu¬ 
cleophilic amino acid residue is in close prox¬ 
imity to the reactive group. However, not only 
the location and orientation, but also the size 
and inherent reactivity of the reactive func¬ 
tional group are critical for its potential as an 
affinity label. 

Perhaps the archetypical example of an af¬ 
finity label is TPCK (66) (Fig. 17.30). This 
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Figure 17.30. Inhibition of chymotrypsin by TPCK. 


compound was designed to mimic substrates 
of chymotrypsin such as the tosyl-L-phenylal- 
anine methyl ester (Equation 17 . 48 ), thereby 
providing a basis of affinity for the chymotryp- 
sin-active site. 

In addition to mimicking a substrate, it 
contains the halomethyl ketone moiety, to 


provide a point of covalent attachment (175). 
TPCK was shown to irreversibly inhibit chy¬ 
motrypsin (it is still employed today to remove 
chymotrypsin from trypsin preparations) by 
specifically labeling a histidine residue ( 175 ), 
later identified as His57 ( 176 ). After the suc¬ 
cess of TPCK, chloromethyl ketones became 



methyl N-tosyl-L-phenylalanine 
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extremely popular for the inactivation of pro¬ 
teases. By incorporating part of the sequence 
of the physiological substrate into the halo- 
methyl ketone, it was possible to obtain selec¬ 
tive inactivation of individual proteases (177). 
This selective inactivation also meant that 
chloromethyl ketones became widely used as 
probes for the binding requirements and 
chemically reactive residues in the active sites 
of serine proteases, in particular. Replace¬ 
ment of the chloromethyl ketone moiety by a 
diazomethyl group provided a specific inacti¬ 
vation of cysteine proteases (172). The use of 
TPCK has not been restricted to chymotryp- 
sin, as elegantly demonstrated in a recent re¬ 
port on the inhibition of human aldehyde de¬ 
hydrogenase (178). As a group, proteases 
remain major targets for therapeutic inter¬ 
vention, and peptide-based affinity labels are 
still playing a major role in drug design (179). 

The interconversion of (R)-mandelate and 
(S)-mandelate is catalyzed by mandelate race- 
mase (Equation 17.49). The reaction can be 
reversibly inhibited by the substrate analog 
atrolactate ( 67 ), (Fig. 17.31). Because of its 
structural similarity to both (67) and the sub¬ 
strate and, given the reactivity of the epoxide 
group to nucleophiles, (R,S)-a-phenylglyci- 
date (68) was synthesized as a potential affin¬ 
ity label of mandelate racemase. The com¬ 
pound was found to be an irreversible 
inhibitor, fitting all the criteria described in 
section 3.1.1 (180). Later it was established 
that (£)-n-phenylglycidate (S-oiPGA) did not 
irreversibly inactivate the enzyme, binding 
noncovalently and with less affinity than 
R-aPGA (69). As shown in Figure 17.31, the 
epoxide ring of R-aPGA potentially is subject 
to attack at either of two carbons. Attack at 
the distal endocyclie carbon atom of (69) (path 
a) will result in the formation of ( 70 ), whereas 
attack at the a-carbon (path b) will yield (71). 
The crystal structure of the inactivated com¬ 
plex revealed that nucleophilic attack of the 
e-amino group of Lysl66 resulted in adduct 
( 72 ), which is consistent with attack on the 
distal carbon of the epoxide ring (181). This 
structure confirmed the original design 
premise of Fee et al. (180), wherein it was 
thought that the distal oxirane carbon occu¬ 
pied the position similar to the a-proton in 
mandelate. Therefore, on binding of the 


a-phenylglycidate to the enzyme, the electro¬ 
philic epoxide group would be subject to attack 
by the nucleophile responsible for a-proton 
abstraction in the normal catalytic cycle. Fur¬ 
ther confirmation is provided by the X-ray 
structure of (S)-atrolactate bound to the race¬ 
mase (181), which reveals that Lysl66 has 
been pushed away by the a-methyl group of 
(S)-atrolactate (which is positionally equiva¬ 
lent but much larger than the a-proton in (S)- 
mandelate). In both structures the positions of 
the remaining active-site residues are almost 
identical. 



mandelate 

- 

t - 

racemase 


(R)-mandelate 



(17.49) 


(Sj-mandelate 


Perhaps the best-known affinity labeling 
reagent is aspirin (73)(Fig. 17.31), a member 
of the class of drugs known as the nonsteroidal 
anti-inflammatory drugs (NSAIDS), and 
whose activity was initially reported to result 
from its inhibition of prostaglandin biosynthe¬ 
sis (182,183). Prostaglandins are involved in 
the inflammatory response and can cause 
headache and vascular pain in humans. 

Prostaglandin synthase, which catalyzes 
the first step in the arachidonic acid cascade, is 
a heme protein and possesses two activities. 
As illustrated in Equation 17.50, a cyclooxy¬ 
genase activity is used in the conversion of 
arachidonic acid to the bicyclic endoperoxide 
PGG„ whereas a peroxidase activity catalyzes 
the subsequent reduction of PGG, to prosta¬ 
glandin H,. The latter serves as a branch point 
in the production of various prostaglandins as 
well as thromboxane A 2 and prostacyclin 
(PGI 2 ). 

Aspirin (acetylsalicylicacid) was ultimately 
confirmed as an inhibitor of prostaglandin 
synthetase (184). Incubation of [acetyl- 3 H] 
aspirin showed that one acetyl group was in- 
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(72) 


Figure 17.31. (a) Inhibitors of mandelate race- 
mase, (b) potential products from nucleophilic at¬ 
tack on the epoxide of R -a-phenylglycidate, and (c) 
adduct formed by attack of Lysl66. 

corporated per mole of enzyme (185,186). En¬ 
zymatic digest of the labeled enzyme provided 
evidence that a serine residue, later identified 
as Ser530, was acetylated (186,187), probably 


by the mechanism shown in Fig. 17.32. The 
X-ray structure of bromoaspirin (74) inacti¬ 
vated prostaglandin synthase has been solved, 
and the bromoacetylationof Ser530 confirmed 
(188). The structure was similar to that of 
flurbiprofen (75), complexed with prostaglan¬ 
din synthase (189). Aspirin is the only NSAID 
known to inactivate prostaglandin synthase 
through covalent modification, and the bromi- 
nated aspirin analog was also determined to be 
a potent irreversible inhibitor. Conversely, 
flurbiprofen (75), another NSAID, has been 
classified as a slow-tight-binding inhibitor 
(190) and was expected to induce a conforma¬ 
tional change upon binding. However, there 
were no significant differences between the 
two X-ray structures, and it is yet to be deter¬ 
mined whether the binding of aspirin also in¬ 
duces a conformational change in the enzyme. 

Although affinity labels have played a ma¬ 
jor role in characterizing the active sites of a 
large number of proteases, they have also 
proved to be particularly useful in mapping 
nucleotide-binding sites (161, 163, 191). Nu¬ 
merous compounds, some of which are shown 
in Fig. 17.33, have been designed to be analogs 
of the various nucleosides and nucleotides. 
Perhaps the best known of these is 5'- jo-fluo- 
rosulfonylbenzoyl adenosine (5'-FSBA) (76), 
which was designed to be an analog of ADP or 
ATP (77). It has both the adenosine and ribose 
moieties, as well as a carbonyl group adjacent 
to the 5' position. The latter mimics the first 
phosphoryl group of the purine nucleotides. If 
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Figure 17.32. (a) Inactivation cf prostaglandin H2 synthase by aspirin, and (b) inhibitors cocrys¬ 
tallized with prostaglandin synthase. 


the molecule is arranged in an extended con¬ 
formation, the reactive sulfonyl fluoride group 
would be found in a position analogous to that 
occupied by the y-phosphoryl group of ATP. 
This was initially used to explore the regula¬ 
tory site of glutamate dehydrogenase (192) 
and the active site of pyruvate kinase (193). It 
has now been employed to label the NAD and 
ATP sites of more than 50 proteins (163). 
Modifications to 5'-FSBA have provided the 
fluorescent probe (78) as well as the bifunc¬ 
tional affinity label (79), which has a photoac- 
tivatable azido group as well as the electro¬ 
philic fluorosulfonyl moiety (163). The 
bromodioxybutyl compound (80)contains the 
adenine, ribose, and 5'-monophosphate of 
adenosine monophosphate (AMP). It is also 
water soluble and negatively charged at neu¬ 
tral pH. As described above, a bromomethyl 
ketone group will react with a number of 
nucleophiles, whereas the dioxo group can po¬ 
tentially react with arginine residues. This re¬ 
agent has a structural similarity to adenylo¬ 
succinate (81) and was used to identify a 
critical arginine residue in the active site of 
adenylosuccinate lyase, an enzyme whose de¬ 
ficiency in humans leads to severe mental re¬ 
tardation and autism (163). 


3.3 Mechanism-Based Inhibitors 

Mechanism-based inactivators have great po¬ 
tential as drugs because they are designed to 
be specific toward their target enzyme. Fur¬ 
thermore, because these compounds are unre? 
active until activated within their target en¬ 
zyme, they are expected to show little or no 
cellular toxicity. The design of mechanism- 
based inhibitors requires an understanding of 
the binding specificity requirements for the 
ligand-recognition site of the enzyme, to pro¬ 
mote the formation of the initial noncovalent 
enzyme-inhibitor complex E. I (Equation 
17.46). In addition, the choice of an appropri¬ 
ate latent functional group requires knowl¬ 
edge of the catalytic mechanism of the target 
enzyme with its normal substrate. Finally, co¬ 
valent bond formation by the activated inhib¬ 
itor (F) will strongly depend on its inherent 
chemical reactivity, and its proximity to a sus¬ 
ceptible amino acid residue or cofactor. A 
number of excellent reviews and monographs 
have appeared on the general design of mech¬ 
anism-based inhibitors (166, 167, 194-201). 
The following examples have been chosen to 
emphasize both the potential for the use of 
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Figure 17.33. ATP (77), adenylosuccinate (81), and representative affinity label analogs. 

mechanism-based inhibitors as drugs and phosphate (PLP)-dependent enzymes have- 
the diversity of their mechanisms of inacti- been found to be most susceptible to mecha- 
vation. nism-based inhibitors (202). To some extent 

Of all the classes of enzymes, the pyridoxal this is because the mechanism of catalysis by 
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Figure 17.34. Inactivation of GABA transaminase by vigabatrin (PMP = pyridoxamine phosphate). 
For clarity, substituents on the pyridine ring are omitted. 
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(17.53) 


soma brucei. One of the currently used drugs, 
eflomithine (a-difluoromethylomithine, DFMO) 
(85), is a mechanism-based inhibitor of ODC. 
The inactivation of ODC by DFMO involves 
the decarboxylation of DFMO by the enzyme, 
with subsequent stoichiometric binding of a 
reactive species to the enzyme (204). The pro¬ 
posed mechanism for inhibition of T. brucei 
ODC is outlined in Fig. 17.35. 

DFMO initially forms a Schiff base with 
PLP ( 86 ), then, following decarboxylation, a 
fluoride ion is eliminated, thereby generating 
the electrophilic conjugated imine (87). Attack 
by the nucleophilic thiol group of Cys360 and 
subsequent elimination of a second fluoride 
anion yields a second conjugated imine (88). 
The second imine then undergoes a transaldi- 
mination reaction with the amino group of 
Lys69. The enamine formed in this reaction 
(89) may then undergo an internal cyclization 
to yield a cyclic imine (90). This is the main 
product formed by the alkylation of ODC by 
DFMO (204). Recently, the X-ray structure of 
the DFMO-inactivated T. brucei ODC K69A 
mutant was solved (205). This structure was 
of the second conjugated imine (88)complexed 
with ODC and showed that, within a single 
active site, the decarboxylated DFMO bridges 
two subunits, forming a Schiff base with the 
PLP on one monomer and a covalent bond to 
Cys360 on the second monomer. 

The enzyme steroid 5a-reductase is an 
NADPH-dependent enzyme that catalyzes the 
conversion of testosterone to dihydrotestos¬ 
terone, a more potent androgen (Equation 
17.54). 



OH 



(17.54) 


Dihydrotestosterone, rather than testos¬ 
terone, had been implicated in endocrine dis¬ 
orders such as acne, enlargement of the pros¬ 
tate, and male pattern baldness, and it was 
suggested that 5a-reductase was an attractive 
therapeutic target. Initially, finasteride (91) 
(Fig. 17.36) was developed as a potent revers¬ 
ible inhibitor of 5a-reductase with a in the 
low nanomolar range (206). Closer examina¬ 
tion revealed that finasteride appeared to be a 
slow-binding, high-affinity inhibitor of the hu¬ 
man prostate (type 2) Sct-reductase, with a 
of less than 1 n M (207). Finasteride is cur¬ 
rently the drug of choice in the treatment of 
benign prostatic hyperplasia, and it is now 
thought that finasteride is, in fact, a mecha¬ 
nism-based inhibitor (208, 209), which acts 
through an enzyme-bound N ADP-dihydrofin- 
asteride adduct (Fig. 17.36). 

In this mechanism, reduction of finasteride 
(91) leads to the formation of an enolate (92) 
and the subsequent formation of an adduct 
with NADP + (93) (where PADPR = phos- 
phoadenonsine diphosphoribose).The dissoci¬ 
ation constant for the enzyme-inhibitor com¬ 
plex is less than 10 -13 M, and the partition 
ratio for the enzyme-catalyzed formation of 
dihydrofinasteride (94) is less than 1.07 (208). 
Clearly, finasteride is an extremely efficient 
mechanism-based inhibitor. As shown in Fig. 
17.36, the NADP + -finasteride adduct will 
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(90) 


Figure 17.35. Inactivation of ornithine decarboxylase by eflornithine. For clarity, substituents on 
the pyridine ring are omitted. 
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Figure 17.36. Inhibition of steroid 5a-reductase by finasteride (PADPR = phosphoadenosine 
diphosphoribose). 


eventually dissociate and form dihydrofinas¬ 
teride (94), although the half-life of 14 days 
also points to the effectiveness of finasteride 
as a steroid 5a-reductase inhibitor. 

Enzymes involved in steroid biosynthesis 
have proved to be good targets, both for ther¬ 
apeutic intervention and for mechanism- 
based inactivators (2). Aromatase, for exam¬ 
ple, catalyzes the final, rate-limiting step in 
estrogen biosynthesis (Equation 17.55). Aro¬ 
matase has proved susceptible to mechanism- 
based inhibitors such as formestane and ex- 
emestane. These are now both used in the 
treatment of breast cancer (210). 

In the last decade there has been a consid¬ 
erable increase in the occurrence of antibiotic- 
resistant microbial pathogens. Vancomycin, 
one of the last resort antibiotics for treating 
some gram-positivebacterial infections, inhib¬ 
its peptidoglycan synthesis by binding the 
terminal D-alanyl-D-alanine (D-Ala-D-Ala) 
dipeptide from pentapeptide precursors of 
Enterococcus cell walls. VanX is a zinc-depen¬ 
dent D-Ala-D-Ala dipeptidase (Equation 17.56), 
which has been implicated in high-level resis¬ 


tance to vancomycin (211, 212). As a conse¬ 
quence, VanX has become a prime drug target 
for overcoming vancomycin resistance and a 
number of transition-state analogs have been 
prepared (213,214). 

The enzyme was also shown to process 
dipeptides with bulky C-terminal amino 
groups (213) and, using this knowledge, a 
novel mechanism-based inhibitor was re¬ 
cently developed (215). Its mechanism is 
shown in Fig. 17.37. i>Ala-D-Gly(S<Ey)-CHF 2 )-0H 
(95) is a dipeptide-like analog of D-Ala-D-Ala 
and is readily accepted by VanX. Cleavage of 
the peptide bond and elimination of D-alanine 
results in the formation of the metastable 2-p- 
difhioromethylthioglycine (96), which spon- 
staneously decomposes, yielding ammonia, 
glyoxylic acid, and p-difluoromethyl thiophe- 
nol (97). Elimination of a fluoride ion results 
in the electrophilic 4-thioquinone fluoro- 
methide (98), which irreversibly alkylates the 
enzyme (99). Interestingly, the turnover of 
the analog was faster than that of D-Ala-D-Ala 
itself. However, the partition ratio of 7500 in- 
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(17.55) 



D-ala-D-ala (17.56) 


O 



Mg 2 * = 

ch 3 


dicated that one of the reactive intermediates 
must be relatively long-lived (215). 

More detailed examples of approaches used 
to design mechanism-based inhibitors may be 
found in excellent reviews by Silverman (166, 
167) and by Ator and Ortiz de Montellano 
(216). 

3.4 Pseudoirreversible Inhibitors 

Pseudoirreversible inhibitors are the least 
common of the covalently binding enzyme in¬ 
hibitors. They have some features in common 
with both affinity labels (Section 3.2) and 
mechanism-based inhibitors (Section 3.3) but 
they have one distinguishing feature; that is, 


the covalent bond formed between the enzyme 
and the inhibitor is reversible (Equation 
17.57). As with the affinity labels, initially 
they bind to the enzyme's active site in a non- 
covalent fashion to form an enzyme-inhibitor 
complex E . I but, unlike an affinity label, the 
pseudoirreversible inhibitor generally pos¬ 
sesses unreactive functional groups. As with 
the mechanism-based inhibitor, the enzyme 
then starts the catalytic cycle and an active- 
site residue, usually one involved in covalent 
catalysis (60), reacts with the inhibitor, with¬ 
out producing a highly reactive species, and 
forms a covalent bond. 

&i k 2 & 3 

E + I ^==± E • I E - I' —» E + P (17.57) 

&-1 k - 2 

The covalently bound inhibitor mimics the 
normal covalent reaction intermediate occur¬ 
ring during the normal reaction mechanism. 
However, the covalent adduct is far more sta¬ 
ble, with half-lives on the order of several 
hours to days. The free enzyme may then, de¬ 
pending on the lability of the E-I' bond, be 
regenerated by hydrolysis or reversal of the 
covalent bond. The utility of a pseudoirrevers- 
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Figure 17.37. Mechanism 

ible inhibitor will be determined by a combi¬ 
nation of the rate of formation of the covalent 
enzyme inhibitor adduct and the half-life for 
reactivation. 

As may be expected, criteria for the study of 
pseudoirreversible inhibitors are very similar 
to those for both affinity labels and mecha¬ 
nism-based inhibitors. However, because of 
the inherent reversibility of pseudoirrevers- 
ible inhibitors, it may be more difficult to 
obtain structural evidence for the covalent en¬ 
zyme inhibitor adduct. Further, determina¬ 
tion of the rate of reactivation and character¬ 
ization of the products of the recovery process 
will also be of major importance in designating 
an inhibitor as pseudoirreversible. 

Pseudoirreversible inhibitors can be bro¬ 
ken into two classes, depending on how the 



based inhibition of VanX. 

active enzyme is regenerated. In the first class, 
exemplified by inhibitors of acetylcholinester¬ 
ase, the enzyme is regenerated as the covalent 
E-F bond is hydrolyzed (i.e., k 3 $> k _ 2 )■ As 
shown in Equation 17.58 , acetylcholinester¬ 
ase catalyzes the hydrolysis of acetylcholine, 
yielding choline and acetate. 

Acetylcholine is a neurotransmitter that 
relays nerve impulses across the neuromuscu¬ 
lar junction. Acetylcholinesterase (AcChE) 
rapidly breaks down acetylcholine, thereby 
loweringits concentration in the synaptic cleft 
and ensuring that nerve impulses are of a fi¬ 
nite length. As shown in Fig. 17.38, a nucleo¬ 
philic serine residue reacts with the substrate 
to form an acetyl-serine intermediate (100) 
with concomitant release of choline. This in¬ 
termediate is then rapidly hydrolyzed by wa- 


H 3 C—C ~ 0—CH 2 CH 2 N—CH 3 


acetylcholinesterase 


h 3 c— c —0 +HO—ch 2 ch 2 n—ch 3 

X ch 3 (17 ' 58) 


acetylcholine 


acetate 


choline 
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(CH 3 ) 3 N 
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O 
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Figure 17.38. (a) Mecha¬ 
nism of reaction, (b) irrevers¬ 
ible inhibitors, and (c,d) 
pseudoirreversible inhibitors 
of acetylcholinesterase. 
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ter, producing acetate and regenerated en¬ 
zyme. Agents such as parathion (101) and 
sarin (102) have found utility as insecticides 
and nerve gases, respectively, because they re¬ 
act with the enzyme to form the active-site 
serine-phosphate esters, ( 103 ) and ( 104 ). 
These esters are hydrolyzed extremely slowly 
by water, making the inhibition effectively ir¬ 
reversible (i.e., both k-, and k, are very 
small), although the inhibition can be over¬ 
come with high concentrations of strong nu¬ 
cleophiles such as hydroxylamine. 

More recently, it has been established that 
inhibitors of acetylcholinesterase may play a 
role in the memory enhancement in patients 
with Alzheimer's disease (217). Unlike (101) 
and (102), carbamate inhibitors such as phy- 
sostigmine (105) and rivastigmine ( 106 ) are 
classified as pseudoirreversible inhibitors be¬ 
cause they react with AcChE to form a car- 
bamylated serine (107). By comparison with 
the serine-phosphate ester, the carbamylated 
serine is rapidly hydrolyzed, thereby regener¬ 
ating AcChE. For example, reactivation of the 
physostigmine-inactivated enzyme is rapid, 
with a £ 1/2 of less than 40 min (218). Rivastig¬ 
mine, a more useful therapeutic agent, is con¬ 
siderably longer acting, with a half-life of more 
than 10 h (217, 219). Overall, for pseudoirre¬ 
versible inhibitors of this type, the effective¬ 
ness and duration of the "irreversible" inhibi¬ 
tion will be controlled by the chemical nature 
of the groups transferred to the active-site nu¬ 
cleophile, making it readily amenable to ma¬ 
nipulation. 

In pseudoirreversible inhibitors of the sec¬ 
ond class, the enzyme is regenerated by the 
inhibitor simply dissociating from the en¬ 
zyme; that is,*the binding is covalent but re¬ 
versible (k_ 2 k,). This class can also be ex¬ 

emplified by an AcChE inhibitor. For example, 
the trifluoromethyl ketone ( 108 ) binds to 
AcChE as a slow-binding inhibitor (Section 
2.4.1) with a K i value of 0.06 nM, and a k of{ 
value of 6.7 X 10~ 6 s"" 1 (220). A linear corre¬ 
lation was observed between K * values of a se¬ 
ries of fluoromethyl ketones and the V max /K i 
value for the corresponding substrate (220). 
This suggests (127) that the tetrahedral ad¬ 
duct ( 109 ), in effect, mimics the transition 
state (or a high-energy intermediate), thereby 
accounting for the high affinity (Section 


2.5.3). The affinity of the inhibitor for AcChE 
could be decreased (with a concomitant in¬ 
crease in the value of kA by sequentially re¬ 
ducing the number of fluorine atoms into the 
methyl group adjacent to the ketone (220). Fi¬ 
nally, it should be noted that the two classes of 
pseudoirreversible inhibitor can be differenti¬ 
ated by examining the decomposition products 
of the inhibition reaction. When hydrolysis is 
required for enzyme regeneration, cleavage 
products, such as substituted carbamates, will 
be in evidence. Conversely, the trifluoro¬ 
methyl ketones will not be broken down by 
AcChE and no decomposition products will be 
observed. 


4 CONCLUSIONS 

Enzyme inhibitors have long played an impor¬ 
tant role in medicine, pharmacology, and basic 
research. The advances in DNA technology 
have enabled cloning and overexpression of 
large numbers of enzymes, and the ap¬ 
proaches described in this chapter have al¬ 
ready led to the development of novel thera¬ 
peutic agents. However, in the postgenomics 
era, large numbers of new targets have been 
identified. Although the drug discovery pro¬ 
cess moves toward structure-based drug de¬ 
sign as its prime tool, even with high-through- 
put crystallography, not all target proteins 
will be readily accessible. The evolution of al¬ 
gorithms that can predict enzyme function 
and mechanism will ensure that the rational 
design of enzyme inhibitors not only comple¬ 
ments structure-based approaches but contin¬ 
ues to play a stand-alone role in the discovery 
of novel therapeutics. 
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1 GENERAL INTRODUCTION 
1.1 Introduction 

The subtle relationship between the efficacy 
and chirality of a drug is an area of research 
that has grown enormously over the past 20 
years because of the recognition that single 
isomer drugs can be more potent and safer 
than their racemic mixtures. It is intended 
that this chapter will provide the reader with 
an appreciation of the area of chirality and 
biological activity. Indeed, consideration of 
chirality in drug design has become ubiqui¬ 
tous because of the greater understanding of 
stereoselective pharmacokinetics, pharmaco¬ 
dynamics, and receptor binding. The enabling 
technologies of chiral synthesis and analysis 
have provided the tools to drive these ad¬ 
vances in the detailed understanding of the 
differential biological activity of stereoiso¬ 
mers. Isomers of identical constitution but dif¬ 
fering in the arrangement of their atoms in 
space are defined as stereoisomers (enanti¬ 
omers and diastereomers are subclasses). En¬ 
antiomers consist of a pair of molecular spe¬ 
cies that are mirror images of each other and 
are not superimposable. 

The term chiralty is broadly used within 
chemistry and drug development; however, 
the terms chiralty or chiral are not always well 
understood by the reader. To clarify the mat¬ 
ter, we can consider that two main situations 
exist. In the first case a sample consists of 
equal numbers of molecules having an oppo¬ 
site sense of chiralty (heterochiral molecules). 
This sample is said to be chiral but racemic. 
The second case occurs when a sample is made 
up of molecules that all have the same sense 
of chiralty (homochiral molecules). In this 
case the sample is said to be chiral and non- 
racemic. 

Recognition of the importance of chirality 
and biological activity has led to the position 
where the regulatory authorities will no 
longer consider the registration of a new race¬ 
mic compound. Exceptions include cases 
where the stereoisomers interconvert in vivo 
or where there is a specific advantage or syn¬ 
ergy associated with dosing both stereoiso¬ 


mers as a racemate. In addition to the benefit 
to the patient of a safer and more potent drug, 
there are numerous advantages to a company 
in developing a single isomer drug. For exam¬ 
ple, the cost and complexity of testing is sim¬ 
pler for single isomers, because the Food and 
Drug Administration (FDA), which regulates 
the approval and sale of drugs and other prod¬ 
ucts in the United States, requires that both of 
the isomers within a racemate be tested. Thus, 
the overall development costs and time are 
greatly increased because of the requirement 
for information on all three, the racemate and 
both isomers separately. The use of a single 
isomer should also result in a lower dosage, at 
least one-half that of the racemate. Because of 
both the therapeutic advantages and the 
greater regulatory burden of proof associated 
with a racemate compared with a single iso¬ 
mer, sales of single isomer drugs have in¬ 
creased to over $120 billion dollars, represent¬ 
ing more than 30% of the total market in 2000 
compared with 3% in 1980 and 9% in 1990. 
When combined with the declining figures for 
new drug application approvals by the FDA, 
the efficient and rapid development of a single 
isomer drug is imperative (1). 

The fundamental reason for the differen¬ 
tial activity of stereoisomers is that the major¬ 
ity of molecules that make up living organisms 
are chiral, and moreover, exist in only one en¬ 
antiomeric form. Thus, stereoisomers will be 
seen by the system as different molecules and 
will have different effects on the biological sys¬ 
tem. Typically therefore, one of the single en¬ 
antiomers of a drug will demonstrate greater 
potency and/or less side effects than the corre¬ 
sponding racemate, and such examples are 
given within this chapter. For example, Viga- 
batrin, which is a selective GABA transami- 
dase inhibitor, gained approval in 1997 as a 
racemate. While the two stereoisomers exhibit 
the same pharmacokinetics, only the S-enan- 
tiomer is active as an anti-epileptic. However, 
there are some limited cases, such as Tram¬ 
adol, where a synergistic benefit associated 
with the dosing of the racemate is claimed in 
comparison with either single enantiomer (2). 



1 General Introduction 


783 


B 




B 


Figure 18.1. 

1.2 Definition of Chirality 

The definition of chirality and its measure¬ 
ment are described in great detail in a number 
cf texts (3); however, a brief introduction to 
the key issues is given in this section. Specifi¬ 
cally, chirality is a term referring to a property 
cf a molecule that is nonsuperimposableon its 
mirror image as shown in Fig. 18.1, where 
such a molecule is chiral. 

In the majority of cases, chirality results 
from the three dimensional orientation of four 
different substituents around a carbon atom 
forming the chiral center. In addition the ori¬ 
entation of atoms or groups around sulfur, 
phosphorus, and nitrogen atoms can some¬ 
times form a chiral center. Examples of chiral 
drugs are numerous but include Certirizine 
(1), Rotigotine (2), and Ifosfamide (3). 

When a molecule contains only one chiral 
center, the two stereoisomers are known as 
enantiomers. These may be referred to or la¬ 
beled using the configurational descriptors as 
either R (rectus meaning righthanded) or S 
(sinister meaning left handed), or alterna¬ 
tively, d (dextrorotatory) or l (levorotatory). 
The d and l configurational descriptors are 


also commonly used in classifying the config¬ 
uration of sugars and amino acids (see below). 
In an achiral environment, enantiomers will 
behave identically, exhibiting for example, the 
same melting/boiling point, lipid solubility, 
nuclear magnetic resonance (NMR), infrared 
(IR), etc. However, in a chiral environment 
such as within the macromolecular compo¬ 
nents of a living system or a chiral high 
performance liquid chromatograpy (HPLC) 
system, the enantiomers display different 
properties, such as a different route and rate 
of metabolism, different biological activity, 
and different retention times in chiral HPLC. 
A frequently quoted analogy for the differing 
properties of enantiomers is the hand and 
glove example. The left and right hand are en¬ 
antiomers of one another, that is they are non- 
superimposable mirror images. If the right 
hand or "enantiomer" is placed in the right 
hand glove, or "receptor", there is a good fit. 
Thus, in the case of a true drug and receptor 
there will be the desired effect. If the left hand, 
or "enantiomer" is placed in the right hand 
glove, there is either a poor fit or no fit. 

As introduced previously, when a chiral 
compound is present as exactly a 1:1 mixture 
of its enantiomers, it is referred to as a race- 
mate or racemic mixture. Thus, if a single en¬ 
antiomer undergoes racemization, the pure 
enantiomer is converted to a 1:1 mixture of 
enantiomers. Perhaps the most famous exam¬ 
ple of a chiral drug is Thalidomide. In the early 
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1960s, Thalidomide was widely prescribed as a 
sleeping pill and as a treatment for morning 
sickness, with claims that it was completely 
safe. We all know of the terrible birth defects 
suffered by children born to mothers who took 
the drug during pregnancy. The drug was 
taken as the racemate, and it has been shown 
that the R-enantiomer is responsible for the 
drug's anti-inflammatory activity, whereas 
the S-enantiomer causes the teratogenicity. 
Separation of the racemic mixture to give the 
patient only the R-enantiomer is not a simple 
answer to the problem. The liver contains an 
enzyme that converts the R- into the S-enati- 
omer, thus negating the benefit of giving the 
single enantiomer (4). 

As described in this chapter, there are 
many reactions that can be performed by 
chemists to create new chiral centers. When 
these reactions are performed in such a way as 
to create one enantiomer in greater amounts 
than the other the process is called asymmet¬ 
ric or stereoselective synthesis. The term en- 
antioselectivity refers to the efficiency with 
which the reaction produces one enantiomer. 
This efficiency is quantitatively described as 
the enantiomeric excess (ee) of the product, 
which is the percentage by which one enantio¬ 
mer is produced in excess of the other. Thus a 
45:8 mixture of two enantiomers will have an 
enantiomeric excess of [(45 - 8)/(45 + 8)] X 
100, which equals 70%. It should be noted that 
if neither the startingmaterial or reaction sys¬ 
tem is chiral and non-racemic, then the prod¬ 
uct will be formed as an equal mixture of the 
enantiomers (i.e., a racemate). 

Glucose is perhaps the most widely avail¬ 
able chiral compound. It is a monosaccharide 
and part of the sugar group (carbohydrates) 
that occur naturally. Sugars, along with 
amino acids, constitute a special example and 
are commonly classified with a d- or L-config- 
uration. In the case of sugars, the D-configura- 
tion is given when the hydroxyl group on the 
highest numbered chiral carbon atom is on the 
righthand side (with the structure drawn in 
the Fischer convention as shown in Fig. 18.3). 
Likewise for L-configured sugars, the hydroxyl 
group is on the lefthand side. In the case of the 
tetrose sugars there are two enantiomer pairs 
as illustrated in Fig. 18.3. Here, the enantio¬ 
mer pairs of erythrose (4, 5), namely d-(—)- 


CHO 


CHO 


H 

H- 


-OH 
-OH 
CH 2 OH 


HO- 

HO- 


-H 
-H 
CH 2 OH 


D-(-)-Erythrose (4) L-(+)-Erythrose (5) 


CHO 


CHO 


HO- 

H- 


-H 
-OH 
CH 2 OH 


H- 

HO- 


-OH 
-H 
CH 2 OH 


L-(-)-Threose (6) D-(+)-Threose (7) 


Figure 18.3. 


erythrose (4), and L-(+)-erythrose (5), and 
threose (6, 7), andL-(-)-threose (6)andD-(+)- 
threose (7), are shown, with each pair of enan¬ 
tiomers being diasteromeric with the other 
pair. Diastereomers can be simply defined as 
stereoisomers that are not enantiomers. The 
prefixes erythro- and threo- are applied to such 
systems that contain two asymmetric carbons 
where two of the groups are identical and the 
third is different. The erythro pair has the 
identical groups on the same side, whereas the 
threo pair has them on opposite sides. 

Finally, as further elucidation of this rela¬ 
tionship where the molecule contains more 
than one chiral center the number of stereo¬ 
isomers increases. In the case of the drug gly- 
copyrrolate which contains two chiral centers, 
there are four possible stereoisomers as shown 
in Fig. 18.4. In general, the number of possible 
isomers can be calculated from the formula 2 n 
where n is the number of chiral centers. 

The four stereoisomers can be divided as 
shown into two pairs of enantiomers, where 
the CR,i?)-(8) and GS,S)-(9) stereoisomers are 
enantiomers of one another, and the (S,R)- 
(10) and (i?,S)-(ll) stereoisomers are also an 
enantiomeric pair. The stereoisomers that do 
not have an enantiomeric relationship to one 
another, such as 8) and (R,S)-( 11 ) are 

known as diastereomers. Like enantiomers, 
these molecules are not superimposable on 
one another, but unlike enantiomers, they do 
not exhibit the same physical, chemical, and 
spectral characteristics. Thus, they have dif¬ 
ferent melting/boiling points, lipid solubility, 
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NMR spectra, retention time in HPLC or thin 
layer chromatography (TLC), and can behave 
differently in chemical reactions with achiral 
reagents. The commercial glycopyrrolate 
product contains only the threo isomers ( S,R )- 
(10)and(R,S)-(ll). 

1.3 Pharmacology 

Biological systems are in the main constructed 
from homochiral molecules such as l- amino 
acids or D-sugars. Such systems give rise to a 
highly "chiral environment," and hence, it is 
not surprising that many drugs possessing 
asymmetric centers exhibit a high degree of 
steroselectivity in their interactions with bio¬ 
logical macromolecules. In the past 20 years or 
so, pharmacological and toxicological investi¬ 
gations have clearly demonstrated significant 
differences in the biological activity of some 
isomeric pairs. Pharmacokinetic investiga¬ 
tions have also led to a better understanding of 
racemic drug action. 

It is important to introduce two other 
terms that compare the pharmacological ac¬ 
tivity of a pair of enantiomers. The isomer im¬ 
parting the desired activity is called the eu- 
tomer (in the case of Thalidomide this is the 
R-enantiomer), whereas the isomer which is 
inactive or causes unwanted side effects is 
called the distomer (this is the S-enantiomer 
for Thalidomide). Comparison of the potencies 
of the two isomers comes from the eudismic 
ratio and this can be used in vitro or in vivo. 


With the advancement in analytical and pre¬ 
parative technologies, the researcher is now 
more able to separate and study individual en¬ 
antiomers. Pharmacological assessment of the 
behavior of chiral compounds in early phase 
research is imperative for selection of the cor¬ 
rect isomer for development. 

When a racemate is administered, the over¬ 
all pharmacological effect may have one of 
three general outcomes described below. 

1. All activity resides in one of the isomers, 

the other antipode being inactive. 

2. Both isomers have equal activity. 

3. Both isomers have the same activity but 

differ in potencies. 

We will briefly highlight some examples 
that help to elucidate the above general classes 
with some pertinent examples. The antihyper¬ 
tensive agent a-methyldopa is an example 
where all the desired antihypertensive activity 
is confined to a single isomer (the L-enantio- 
mer). It is noteworthy that L-(a)-methyldopa 
is a prodrug, being metabolizedto the L-isomer 
of the active metabolite, and it is this metabo¬ 
lite that has the required activity (5).L-Dopais 
marketed as the single enantiomer; during 
early development it was noted that the D-iso- 
mer exhibited serious side effects such as 
granulocytopenia (which is defined as a re¬ 
duced number of blood granulocytes) (6). 
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It has been reported that the single enanti¬ 
omers of Flecainide (12) (Fig. 18.5) have sim¬ 
ilar in vitro pharmacological activity. Assess¬ 
ment of the effect of each enantiomer on the 
action potential characteristics in canine car- 
disc Purkinje fibers gave similar electrophysi- 
ological effects. The plasma concentration 
data of the enantiomers was only very moder¬ 
ately enantioselective. This gave rise to the 
authors concluding that there was no advan¬ 
tage in administering a single enantiomer of 
Flecainide over the racemate (7). It is, how¬ 
ever, relatively more common to find examples 
where the enantiomers have similar qualita¬ 
tive pharmacological activity but differ in 
their potencies. Two classic examples are War¬ 
farin (13)(see Fig. 18.5 for the structure and 
later on in the crystallization section for it's 
preparation) and Propranolol (14). The po¬ 
tency of S-(—)-Warfarin in vivo is two to five 
times greater than that of the R-enantiomer 
(8).However this difference in potency is off¬ 
set by the two- to fivefold greater plasma clear¬ 
ance of S-(—)-Warfarin (9). These apparently 
offsetting properties are only part of the com¬ 


plex pharmacological story surrounding War¬ 
farin, and to this day, it is still administered as 
the racemate. 

j3-blocking drugs such as Propranolol (14) 
(Fig, 18.5) have been shown to stereoselec- 
tively bind to /3-receptors, and it is the S -(—)- 
enantiomer that exhibits the j8-blocking activ¬ 
ity. However, in vitro, the binding of theR-and 
S-enantiomers varies widely within this class 
of compound depending on the structure. 
Again a complex number of pharmacological 
actions come into play with the plasma bind¬ 
ing of the R-enantiomer being much greater 
than it's antipode. The two enantiomers are 
also stereoselectively metabolized at different 
rates (R > S ). Therefore, the pharmacological 
dynamic outcome can vary greatly between 
patients who have different P 450 compositions 
(10). To add to this already complicated story, 
the bioavailability of S-(—)-Propranolol is re¬ 
duced when given as the single enantiomer 
compared with the racemate. This suggests 
that the presence of R-(+)-propranolol has a 
beneficial effect on the availability of the 
S-(-)-enantiomer (11). 

1.4 Protein Binding and Metabolism 

The enantiomers of a specific drug can bind 
stereoselectively to plasma proteins. For ex¬ 
ample, acidic drugs bind in an enantioselective 
manner to human serum albumin (12). There 
are two binding sites on albumin, site I (War¬ 
farin) and site II (indole) (13). The binding of 
L-tryptophan to site II is up to 100 times 
greater than that of D-tryptophan (14). It has 
been suggested that binding to albumin can be 
used as an indication of the extent of the bind¬ 
ing to the drug receptor. For basic drugs, aq- 
acid glycoprotein (AAG) is used and is rela¬ 
tively non-stereoselective (12,15). 

Enantioselective metabolism and clearance 
plays a prominent role in determining the 
pharmacological effect of a drug. For example, 
a highly potent rapidly cleared enantiomer 
may be of less benefit clinically than it's lower 
potency antipode, which is more slowly 
cleared. Returning to Warfarin, the S-enantio- 
mer is eliminated mainly by 7-hydroxylation, 
whereas the R-enantiomer is eliminated by 
ketone reduction and oxidation to 6 and 8-hy- 
droxywarfarin (16). Tramadol is a centrally 
acting analgesic with efficacy and potency 
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ranging between weak opioids and morphine, 
and it is currently used as a racemic mixture 
(Fig. 18.6). It is metabolized in the liver 
mainly to O-desmethyltramadol (ODT), mono- 
N-desmethyltramadol, and di-iV,0-desmethyl- 
tramadol. Of the metabolites only the 
O-desmthyltramadol is pharmacologically ac¬ 
tive and the (+)-ODT has ^200 times greater 
activity for the /x opioid receptor (17). It is 
thought that this metabolite contributes 
largely to the analgesic properties of Tram¬ 
adol, and for this reason numerous studies 
have been undertaken on the activity of the 
single enantiomers of Tramadol (17). The 
complex nature of the interaction of the single 
enantiomers of a drug with biological systems 
described in the introduction will have an ef¬ 
fect on whether a single enantiomer or race- 
mate is taken through to development. 

The following sections describe the avail¬ 
able methods for the separation of enanti¬ 
omers or their preparation using asymmetric 
synthesis. The first sections involve separa¬ 
tions of enantiomers using chromatography or 
crystallization technology. These are often 
considered to be the most expedient methods 
and should deliver the single enantiomers in 
the shortest timescale. Asymmetric synthesis 
has developed considerably over the last 20 
years and now provides an alternative, and at 


times more cost-effective, method of prepar¬ 
ing the single enantiomers. It is noteworthy 
that the 2001 Nobel prize for chemistry was 
awarded to Sharpless, Knowles, and Noyori 
for their pioneering work in the area of asym¬ 
metric synthesis (18). 

2 CHROMATOGRAPHIC SEPARATIONS 

The separation of enantiomeric drugs and in¬ 
termediates by chromatographic methods is a 
well-developed area that is broadly utilized 
from milligrams to tons (19). Initially, these 
chromatographic methods were used to deter¬ 
mine the enantiopurity of the compound ob¬ 
tained from, for example, a separation pro¬ 
cess. With the continuing advancement of 
chiral stationary phases (CSP) and the devel¬ 
opment of chromatography, separation of en¬ 
antiomers using chromatographic techniques 
is increasingly seen as the method of choice 
because of the speed at which the separation 
can be achieved (20). 

A multitude of chromatographic separation 
methodologies exist, all of which can be ap¬ 
plied to the separation of enantiomers; for ex¬ 
ample, liquid chromatography (LC) (21), gas 
chromatography (GC) (22), high performance 
liquid chromatography (HPLC) (23), capillary 
electrophoresis (CE) (24), super critical fluid 
chromatography (SFC) (25), simulated mov¬ 
ing bed (SMB) (26), and membrane technolo¬ 
gies (27). Once the appropriate technique has 
been chosen, at the analytical level, run time, 
sensitivity, and selectivity then need to be en¬ 
hanced to improve the limits of detection and 
analysis time. At the preparative scale (milli¬ 
gram to gram), in addition to enantioselectiv- 
ity, other factors such as high loading capac¬ 
ity, robustness, and chemical compatibility 
are essential requirements when selecting the 
CSP (28). In addition, the CSP needs to be 
readily available and provide a cheaper pro¬ 
cess when compared with the chemical/biolog- 
ical alternatives that are discussed in this 
chapter. However a significant advantage of 
the method is that almost any enantiosepara- 
tion can be achieved with one of about 200 
commercially available chiral selectors by the 
techniques described above (29). In this sec¬ 
tion, we will highlight a number of examples, 
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from the small semi-preparative scale to large- 
potential manufacturing processes. 

2.1 Small-Scale HPLC Examples 

HPLC is now a widely available and user- 
friendly method employed for qualitative and 
quantitative analysis and is also one of the 
most expedient methods for providing the mil¬ 
ligram quantities of stereochemically pure 
material required for initial testing. Often the 
identification of a suitable CSP to effect sepa¬ 
ration of a specific pair of enantiomers is seen 
as being labor intensive and requiring consid¬ 
erable exverimentation. However, the avail¬ 
ability of commercial databases that compile 
literature on LC enantioseparations makes 
this process significantly easier (30). The com¬ 
panies that supply CSPs also provide detailed 
information about a specific columns' suitabil¬ 
ity towards the separation of certain types of 
compounds (31). This helps to avoid a "trial 
and error" approach towards enantiosepara¬ 
tions using chromatography. The use of col¬ 
umn switchers to test a number of CSPs can 
also be of enormous assistance in a rational 
screening program. 

One example of separation by HPLC is 
Clenbuterol, which is an orally active, sympa¬ 
thomimetic agent that has specificity for 182 - 
adrenoceptors. Owing to its bronchodilator 
properties, it has found use in the treatment of 
respiratory disorders in humans and animals 

(32) . The two enantiomers of Clenbuterol 
have been separated using a chirobiotic col¬ 
umn, which consists of a macrolide-type anti¬ 
biotic stationary phase, using a mobile phase 
with composition of 70%MeOH, 30% acetoni¬ 
trile, 0.3% acetic acid, and 0.2% triethylamine 

(33) . The enantiomers eluted as follows: 
R -(-)-Clenbuterol (15)with a retention time 
of 8.35 minutes and S-(+)-Clenbuterol (16) 
with a retention time of 9.12 minutes. The sin¬ 
gle enantiomers obtained through chromatog¬ 
raphy were of >95% optical purity. It has been 
shown that (-)-Clenbuterol was 100-1000 
times more potent than (+)-Clenbuterol in 
^-adrenergic agonist bioassays (34). 

A number of 1,4-dihydropyridines (17-20), 
exhibiting axial chirality (chiralty stemming 
from the nonplanar arrangement of four 
groups about an axis), have been separated by 
small-scale HPLC methods. This is an impor¬ 


tant class of drugs that are potent blockers of 
calcium currents and have found use in the 
treatment of cardiac arrhythmias, peripheral 
vascular disorders, and hypertension (35). It 
has been shown that enantiomers of chiral 
DHP have opposite pharmacological profiles 
(35). One of the antipodes is a calcium entry 
activator, while the other is a calcium entry 
blocker. The analytical and semi-preparative 
separation using chiral HPLC for a number of 
DHPs of the structures (Fig. 18.7) has been 
described (36). Here a number of different 
CSP were utilized and their ability to separate 
the above DHPs determined. 

2.2 Chromatographic Diastereoisomer 
Separation 

Another approach to the separation of enanti¬ 
omers by chromatography is to prepare a di¬ 
astereoisomer of the enantiomer to be sepa¬ 
rated. As discussed in the introduction to this 
chapter, diastereomers exist if there is more 
than one chiral center, but are not enanti¬ 
omers of one another. As such they do not 
have identical physical properties. In chroma¬ 
tography, formation of derivatives such as 
esters, amides, etc., often leads to better sepa¬ 
ration of the components. In the case of a race- 
mate, if a chiral reagent (i.e., acid or amine) is 
employed, then a diastereomeric mixture re¬ 
sults on treatment with such a derivatizing 
agent. One such example is the derivatization 
of Pirlindole, which is a racemic anti-depres¬ 
sant drug. Here the use of amino acid deriva¬ 
tives as chiral derivatizing agents (CDA) was 
shown to enable an effective and efficient sep¬ 
aration (37). Preparation of the L-phenylala- 
nine methyl ester (21) enabled separation of 
the Pirlindole enantiomers using a medium 
liquid pressure (MPLC) method. This is high¬ 
lighted in Fig 18.8, after removal of the CDA 
the enantiomers of pirlindole were obtained in 
high optical purity. This gave several grams of 
each enantiomer, which permitted a study of 
the stereochemical influence at the pharmaco¬ 
logical level. The interaction with monoamine 
oxidase A (MAO-A) and B (MAO-B) with 
Pirlindole racemate and single enantiomers 
using biochemical techniques (in vitro and ex 
vivo determination of rat brain MAO-A and 
MAO-B activity) was studied. In vitro, the 
MAO-A IC„ of (±)-Pirlindole, fl-(-)-Pirlin- 
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dole (22), and S-(+)-Pirlindole (23) were 0.24, 
0.43, and 0.18 (jM } respectively. The differ¬ 
ences between the three compounds were not 
significant, with a ratio between the two enan¬ 
tiomers R-{-)!S-(+) of 2.2 in vitro (38). 

2.3 Preparative HPLC/SMB 

In the initial discovery phase of drug research, 
time is the most important factor where a suc¬ 
cessful process must be rapidly identified, 
have a short run time, and have general appli¬ 
cability. As the phase of the project changes to 
full development, the process needs to be es¬ 
tablished and cost becomes a crucial factor. 
Thus, on scale up of an LC method to the pre¬ 
parative level (100mg and above), a number of 
additional important aspects become relevant. 
The selection of a suitable CSP from the pleth¬ 
ora available depends on the foliowingfactors: 
CSP availability, loading capacity and selectiv¬ 
ity, throughput, and mobile phase. 

The most successful and broadly applied 
chiral stationary phases comprise the cellu- 
lose-and amylose-based phases developed by 
Okamoto (Chiracel and Chiralpak) (39), 
brush-type phases developed by Pirkle (40), 


some polyacrylamides (Chiraspher) (41), 
cross-linked diallyltartramide (42), and to a 
lesser extent, cyclo-dextrin based phases. 
Clearly for the larger scale separations, the 
availability of the CSP in larger quantities is a 
prerequisite. It should also be noted that at 
the preparative scale, it seems that up to 90% 
of racemic compounds tested have been re¬ 
solved with just four different polysaccharide- 
based phases (43). 

The degree of separation of the two enanti¬ 
omers obviously plays an important part in 
the CSP selection. Another equally important 
parameter is the loading capacity of the sta¬ 
tionary phase. The higher the loading capac¬ 
ity, the greater the amount of material that 
can be separated (44). For example the poly¬ 
saccharide-based CSPs have a saturation ca¬ 
pacity of 5-100 mg lg of CSP; this is clearly 
dependent on the type of racemate that is be- 
ingresolved. On the other hand, protein-based 
CSPs have lower saturation capacities, of the 
order 0.1-0.2 mg/g of CSP. 

For preparative chromatography, through¬ 
put can be defined as the amount of purified 
material obtained per unit of time and per unit 
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mass of stationary phase. Several factors af¬ 
fect this including loading capacity, column ef¬ 
ficiency, selectivity, column size, temperature, 
cycle time, flow rate, and the solubility of the 
racemate. 

The mobile phase plays a crucial role in the 
separation process for at least three main rea¬ 
sons. The selectivity of the separation, reten¬ 
tion time, and solubility of the racemate are 
directly affected by the mobile phase composi¬ 
tion. Other parameters such as viscosity, sol¬ 
vent recovery, cost, and solvent handling 
properties also play a prominent role. This 
brief introduction is also applicable to the cri¬ 
teria for CSP selection for SMB. 

An example of a drug separated by prepar¬ 
ative HPLC is cetirizine dihydrochloride, a ra¬ 
cemic drug that is a second generation antihis¬ 
tamine receptor antagonist. Studies on the 
effect of racemic and R (25) and S-Cetirizine 
(26) on nasal resistance indicated that both 
racemic and the R-enantiomer had similar ac¬ 
tivity. The racemate and R-enantiomer inhibit 
histamine and induced an increase in nasal 
resistance, thus indicating the antihistaminic 
properties of R-Cetirizine (45) .TheS-enantio- 


mer was shown not to exhibit these antihista¬ 
minic effects. An asymmetric synthesis (46), 
and resolution of an intermediate have deliv¬ 
ered the single enantiomer previously. How¬ 
ever, for various reasons, the development of a 
preparative HPLC method seems to be the 
method of choice (47) . The main reasons are 
the rapid scale up and the improved economics 
of this approach. Utilization of the amide (24) 
(Fig. 18.9)gave rise to a highly efficient sepa¬ 
ration using a Chiralpak AD column in a mix¬ 
ture of acetonitrile/iso-propanol 60:40. The ef¬ 
ficiency of the separation can be measured by 
the a value (2.76)or the USP resolution (8.54). 
The a value and USP resolution numbers are 
measurements of how efficient the separation 
is; typically the higher the number, the better 
the separation. This enabled the production of 
1.6kg of both the (+) and (-) isomers of high 
purity. 

Like all methods for separating chiral mol¬ 
ecules, chromatographic separations do suffer 
from drawbacks: large quantities of expensive 
stationary phases are needed and large vol¬ 
umes of mobile phases are used, coupled with 
the resultant high dilution of separated prod- 
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ucts. A number of methods have been intro¬ 
duced in an attempt to improve on this tech¬ 
nology, such as recycling (44). Perhaps the 
biggest advancement in recent times has been 
the introduction and application of SMB tech¬ 
nology in the field of chiral separations (48). 
This technique was pioneered in the late 
1950s by Universal Oil Products in the United 
States as a useful method for separation of oil 
derivatives and sugars (49). Initially SMB 
technology was applied to very large volumes 
of material. For example, xylene isomers are 
separated in thousands of ton quantities an¬ 
nually. The application of SMB to the separa¬ 
tion of racemic mixtures has led to downsizing 
and modifications of this technology, but the 
main principles remain the same. The use of 
counter-current contact in SMB maximizes 
the driving force for mass transfer and the 
contact between the substrate and stationary 
phase. This provides a more efficient use of the 
adsorbent capacity than that of a simple batch 
system (50). 


The separation of racemic mixtures is well 
suited to SMB technology, because these 
counter current systems can generally only 
perform two-component separations at a time 
(51). A detailed description of this technique is 
given in an excellent article by Guest (52).The 
SMB system generally consists of several col¬ 
umns, typically 6-12, which are connected in 
series. An arrangement of pumps and valves 
are set up to maximize the stationary phase 
utilization, allowing for better solvent effi¬ 
ciency and adsorbate concentration. This 
leads to two streams coming off the system in 
solution, one is termed the raffinate, which is 
enriched in the less adsorbed component, and 
the other termed extract, which is enriched in 
the more adsorbed component. The complex 
set of conditions and parameters that are re¬ 
quired to optimize SMB chromatography has 
led to the design and process optimization be¬ 
ing done by computer simulations (53). A 
number of examples will be discussed that 
highlights this growing area of chiral separa- 
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tions. It should be noted that the scale of op¬ 
eration is dependent on the column size and 
can lead to a range from tens of grams to tons 
cf separated isomers. Clearly, the larger quan¬ 
tities separated imply that this technology has 
industrial applications. 

The enantiomers of aminoglutethimide 
(27) (Fig. 18.10) have been separated using an 
SMB approach (48) (see also Section 3, Fig. 
18.11 for more information on aminoglute- 
thimide). A set of 16 columns (6 X 1.6 cm) 
containing Chiracel OJ were used. The feed 
concentration was 1.63% in a mixture of hex- 
ane:ethanol (15:85), which was used as the 
mobile phase. A feed rate of 0.45 ml/min and a 
mobile phase rate of 6 ml/min gave rise to a 
production of 5.27 g of each enantiomer per 
day. The S-(-) -enantiomer was obtained as 
the extract in solution, in a 99.8% purity, 
while the/2-(+)-enantiomer also in solution as 
the raffinate, achieved a 99.9% purity. This 
would lead to a productivity of 59.9 g of each 
enantiomer per kilogram of CSP per day. It 
should be noted that one big advantage of 
SMB over preparative chromatography are 
the vast savings on mobile phase consump¬ 
tion; this is generally coupled to thin film 
evaporators that allow for very high levels of 
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recovery of the solvent. This becomes even 
more evident when a poorly soluble compound 
is used. 

The two isomers of the racemic analgesic 
drug Tramadol (28) (Fig. 18.11) display differ¬ 
ing affinities for various receptors. (- )-Tram- 
adol mainly inhibits the reuptake of noradren¬ 
aline, whereas the (+)-isomer inhibits the 
reuptake of serotonin. In addition, the (+)- 
isomer and its primary metabolite, the O-des- 
methyl derivative, are selective agonists of ju, 
opiate receptors (54). Tramadol has been effi¬ 
ciently separated using SMB; in addition, the 
resolution by crystallization is given in Sec¬ 
tion 3 of this chapter (55). Comparison be¬ 
tween batch chromatography and SMB for the 
separation of tramadol was made. Use of 12 
columns (100 X 21.2 mm ID), each packed 
with 20 g of Chiralpak AD 20-pm phase, and 
using a mobile phase composition of 2-propa¬ 
nol/light petroleum/diethylamine (5:95:0.1 
v/v/v) with feed concentration of 20 g/L, ob¬ 
tained a very high productivity. Thus, 680 g of 
racemic tramadol could be separated per liter 
of stationary phase (which equates to 1.2 kg of 
racemate per kilogram of stationary phase per 
day). The solvent consumption of 144 L/kg of 
racemate should also be noted. This gives both 
(+)- and (-)-enantiomers of high optical pu¬ 
rity, with the extract of 6.33 g/L and the raffi¬ 
nate of 7.69 g/L. Typically, the solvent (mobile 
phase) is readily recycled by the use of thin 
film evaporators, which further extends the 
economic practicality of the process. 

2.4 Conclusions 

It should be noted that all the techniques de¬ 
scribed in this chapter can be inter-linked. In 
other words, if one technique, i.e., asymmetric 
synthesis, failed to deliver enantiopure mate¬ 
rial, then another technique such as crystalli¬ 
zation can be used to push through the prod¬ 
uct to the desired purity. As an example of this 
"double" approach, the application of SMB 
and crystallization to the separation of man- 
delic acid is noteworthy (56). When very high 
levels of enantiopurity are required, the effi¬ 
ciency and cost effectiveness of SMB may not 
be economical. However, if for example, a 
lower enantiomeric excess can be coupled with 
an enhancement by crystallization, then the 



3 Classical Resolution 


793 


SMB approach becomes even more favorable. 
This can lead to substantial increases in the 
productivity of the SMB process. Further ex¬ 
amples of coupling of two techniques will be 
given throughout this chapter. 

In summary chromatographic separations 
offer an expedient method for the separation 
of enantiomers on a small scale. With the de¬ 
velopment of more efficient stationary phases 
and the application of SMB, this may become 
the method of choice for the separation of 
racemates. Each individual case deserves 
investigation by all of the techniques/ap¬ 
proaches described in this section. 
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3 CLASSICAL RESOLUTION 

Perhaps the most widely used method for the 
preparation of single enantiomers involves the 
classical resolution of a racemic mixture, 
which uses the formation of crystalline diaste- 
reomeric salts. As discussed in the introduc¬ 
tion to this chapter, by converting a racemic 
mixture of enantiomers to two diastereomeric 
salts with differing physical properties, one 
being crystalline and the other remaining in 
solution, the molecules can be separated and 
simply converted back to the two separated 
enantiomers. With the advent of automation, 
the classical resolution approach offers a 
speedy and through racemate separation 
methodology. This enables the separation of 
small amounts of material (milligram to gram) 
and can be directly scaled up to provide an 
industrial process (kilogramto ton). A number 
of different approaches to this type of separa¬ 
tion are highlighted in the following sections, 
where it should be noted that diastereomeric 
salt resolutions have mistakenly been consid¬ 
ered to be a mysterious art. In fact, there is 
considerable information in the literature as 
to how to perform a resolution and the physi¬ 
cal chemistry aspects associated with how to 
define and conduct the resolution to its opti¬ 
mum capability (57-59). We will not go into 
this in great detail but will highlight some 
pertinent points; for greater detail, the 
reader is directed to the monograph by 
Jacques et al. (57). 


31 Separation of the Active Pharmaceutical 
Ingredient 

A number of single isomer switches, that is, 
where a drug that was previously sold as a 
racemate is developed and sold as a single iso¬ 
mer, have been isolated through classical res¬ 
olution (60). This approach to a single isomer 
offers several advantages; first, the racemate 
is freely available and can be purchased to 
high levels of purity and quality. Second, the 
analytical methods will also be in place. Also, 
no new synthetic development chemistry is re¬ 
quired, and hence this is the fastest route to 
the single enantiomers at the multigram scale. 
Generally, this is the first method to be tried. 
Some of the many available examples demon¬ 
strate the different nuances that can be ap¬ 
plied in classical resolution to provide the sin¬ 
gle enantiomers in optimal yields and purities 
are given in this section. 

An efficient and large scale resolution of 
methylphenidate (ritalin hydrochloride) using 
dibenzoyl-tartaric acid has been described 
(61). Ritalin is marketed for the treatment of 
children with attention deficient disorder 
(ADHD). Methylphenidate has two chiral cen¬ 
ters and originally was marketed as a mixture 
of two racemates, 20% DL-threo (29, 30) and 
80% DL-erythro (31, 32) (see Fig. 18.12 for the 
structures of all four isomers). As introduced 
previously, the erythro-isomer is defined as the 
case when the main chain of a molecule 
(drawn vertically in a Fischer projection) has 
identical or similar substituents at two adja¬ 
cent non-identical chiral centers on the same 
side of the chain, whereas the threo isomer has 
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the corresponding substituents on opposite 
sides. The racemic drug currently used in 
therapy comprises only the pair of threo-e nan- 
tiomers (29, 30). The mode of action in hu¬ 
mans is not completely understood, but meth- 
ylphenidate presumably activates the brain 
stem arousal system and cortex to produce its 
stimulant effect. In addition, there is no spe¬ 
cific evidence that clearly establishes the 
mechanism whereby methylphenidate pro¬ 
duces its mental and behavioral effects in chil¬ 
dren or conclusive evidence regarding how 
these effects relate to the condition of the 
CNS. The D-threo (29) enantiomer has, how¬ 
ever, been reported to be 5 to 38 times more 
active than the corresponding L-threo enantio¬ 
mer (30) (62). The resolution shown in Fig. 
18.13 uses the racemic hydrochloride salt as 
input material. The HC1 salt is cracked to the 
free base in situ with 4-Me-morpholine, which 
then forms a salt with the resolving agent di¬ 
benzoyl-tartaric acid (DBTA). The required d- 
threo-methylphenidate (29) is removed as the 
crystalline salt of d-(+)-DBTA, leaving the l- 


threo enantiomer (30) in solution with 
4-methylmorpholine hydrochloride. The use 
of 4-methylmorpholine to effect base release 
in situ helps to streamline the process and to 
remove 'a costly free base isolation process. 
The D-threo-methylphenidate, (d)-(+)-DBTA, 
salt is readily converted into the hydrochlo¬ 
ride salt. It is interesting to note that recently, 
Celgene and Norvatis received a FDA approv- 
able letter for the use of dexmethylphenidate 
for use in ADHD. This consists of only the 
D-threo enantiomer (29), in comparison with 
the original product, which contained all four 
isomers (29-32). 

Chemists at Chiroscience took an alterna¬ 
tive approach to the D-^reo- methylphenidate 
(29) single enantiomer (63). An efficient reso¬ 
lution using L-(-)-di-toluoyl-tartaric acid 
(DTTA) was developed. This left the required 
D-threo diastereoisomer in solution with a di- 
astereomeric excess of 88%yield in 55% chem¬ 
ical yield. Conversion of this salt to the free 
base and subsequent crystallization of the hy¬ 
drochloride salt gave >98% ee D-threo methyl¬ 
phenidate in high purity in an overall yield of 
42%. The enhancement of the ee is caused by 
the eutectic point of methylphenidate hydro¬ 
chloride, which is at 30% ee. A more detailed 
description of this phenomenon will be dis¬ 
cussed later in this section. 

(S>Naproxen (36) is a non-steroidal anti¬ 
inflammatory drug that was introduced to 
market in 1976 by Syntex. The yS-(+)-isomeris 
about 28 times more effective than the R -(—)- 
isomer (64). The annual sales in 1995 were 
about $ 1 billion; thus, a large amount of effort 
has been spent developing the synthesis of ( S )- 
Naproxen (65). The resolution of racemic 
Naproxen (33), developed by Syntex, ap¬ 
proaches the ideal case for a Pope Peachy res¬ 
olution, that is, resolution using non-stoichio- 
metric quantities of resolving agent (66). 
Here, a mixture of 1 equivalent (eq) of the 
racemic acid, 0.5 eq of an achiral amine base, 
and 0.5 eq of the chiral amine (iV-alkylglucam- 
ine) are used (Fig. 18.13). This results in the 
formation of two salts: one is the insoluble (S)- 
Naproxen chiral amine (34), obtained in 45- 
47% yield and optical purity of 99%. The sec¬ 
ond salt that remains in solution contains (R)- 
Naproxen and the achiral amine (35). The 
insoluble salt of (R)-Naproxen (34)is removed 
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by filtration. The mother liquors are then 
heated and the achiral amine base catalyzes 
racemization of the unwanted R-enantiomer. 
The resulting racemic mixture of the acid 
CR,S)-(37) can then be put back into the reso¬ 
lution loop. Using this process, the overall 
yield of (5)-Naproxen is >95%, based on the 
input of racemic acid. To further highlight the 
efficiency of this process, the AT-alkylglucam- 
ine resolving agent is recovered in >98% per 
cycle. 

Racemic bupivacaine hydrochloride (38, 
Marcaine) is currently used as an epidural an¬ 
esthetic during labor and as a local anesthetic 
in minor operations. Clinical studies have 
shown that Zeyo-bupivacaine (41) is less car- 
diotoxic in man, making it significantly safer 
than the racemate (67). Separation of the en¬ 
antiomers was readily achieved using 0.25 eq 
of D-tartaric acid. This resulted in the isolation 
of a 2:1 (S)-bupivacaine D-tartaric acid salt 
(39) in 98% de, leaving the (R)-bupivacaine 


free base (40) in solution. Conversion of the 
tartrate salt to (S)-bupivacaine hydrochloride 
(39) was obtained in 35-40% overall yield 
based on racemate input. To increase the eco¬ 
nomics of the process, a racemization of the 
unwanted R-enantiomer was required. Treat¬ 
ment of the liquors containing the enriched 
(R)-bupivacaine, tartaric acid, propanol, and 
propionic acid at reflux resulted in complete 
racemization in 2 h. By pertinent processing, 
the racemic free base thus obtained is isolated 
by crystallization and can be put back into the 
resolution cycle (68). Another fine example by 
chemists from Eli Lilly involves a clever reso- 
lution-racemization-recycle (R-R-R) process 
in the synthesis of Duloxetine (69). 

As discussed in Section 2 of this chapter, 
Tramadol is a chiral drug substance that is 
currently used as a high potency analgesic 
agent. The preparation of Tramadol is shown 
in Fig 18.16, which results in the formation of 
all four possible stereoisomers from the Grig- 
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nard reaction (70). The trans isomers (42, 43) 
form over the cis isomers (44, 45) in a ratio of 
~8:2; the currently marketed racemate con¬ 
sists of only the trans isomers. It is possible to 
take this crude reaction mixture and selec¬ 
tively isolate either the (+)-trans isomer (42), 
by using di-p-toluoyl->tartaric acid [d-(+)~ 
DTTA] resolving agent or the (-)-trans iso¬ 
mer (43) using l-(— )-DTTA. This highlights 
the high selectivity that can be achieved when 
using certain resolving agents. In the case of 
Tramadol, the cis isomers (45,46) do not form 
crystalline salts with DTTA and therefore re¬ 
main in solution. This results in a highly effi¬ 
cient process, where the chiral acid not only 
separates the single enantiomers (42 or 43) 
but also removes other impurities (i.e., cis iso¬ 
mers 44 and 45) at the same time (71). 


Another drug that is sold as a racemate is 
Etodolac (46), which is used as a non-steroidal 
anti-inflammatory agent (NSAID) that also 
has analgesic properties; it has the ability to 
retard the progression of skeletal changes in 
rheumatoid arthritis (72). It has been shown 
that the majority of therapeutic activity lies in 
the S-(+)-isomer (73). D-(-)-N-Methylglu- 
camine (meglumine) is obtained by ring open¬ 
ing of D-glucose with methylamine, and hence 
it is readily available and inexpensive. Scien¬ 
tists at Chiroscience have described the use cf 
meglumine to separate the enantiomers of 
Etodolac (74). It was shown that the meglu¬ 
mine salt possessed suitable properties to en¬ 
able its use as a salt for pharmaceutical admin¬ 
istration. Therefore, in the case of Etodolac, 
meglumine can not only be used to separate 
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the enantiomers, but it can also be used as the 
pharmaceutical salt form of choice. 

In addition to the racemic drugs discussed 
in this section, resolutions are also used in the 
isolation of key building blocks for the phar¬ 
maceutical industry. An important class of 
these intermediates are amino acids, many of 
which are available as the single isomer from 
natural sources (see INTRODUCTION). The 
use of unnatural amino acids and d configured 
ones are expected to have a greater influence 
at the biological level. In the drive for molecu¬ 
lar diversity and metabolic stability, a number 
of unnatural amino acids such as the non-pro- 
teinogenic piperazine carboxylic acid (47) 
(Fig. 18.18) have been developed. Specifically, 
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this amino acid has found use as an interme¬ 
diate compound of the HIV proteinase inhibi¬ 
tor L-735,525 (75). The racemic cyclic amino 
acid (47) has been resolved with S-cam- 
phorsulfonic acid (CSA), which yields the S- 
isomer as the double CSA salt (48) as the pre¬ 
cipitate (76). Retained in the mother liquors is 
the R-isomer (49). This can neatly be racem- 
ized to the S-isomer by mixing with S-CS A in a 
suitable solvent. On seeding with pure ( S,S)~ 
diastereomeric salt, a further quantity of the 
desired (S,S) product (48) is obtained, leaving 
the R-isomer (49) once more in the liquors. 
The whole cycle can be repeated and has been 
demonstrated with four complete cycles. To 
complete the whole process, the resolving 
agent is also readily recovered and recycled. 

3.2 Separation of Intermediates to Single 
Enantiomer Active Pharmaceutical Ingredient 

The previous examples given for diastereo- 
meric salt resolution have all involved separa¬ 
tion of the active pharmaceutical ingredient 
(API) or late stage intermediate. Whereas this 
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does offer several advantages from the point of 
view of time and quality aspects, there are also 
a number of drawbacks. If, for example, a ra- 
cemization of the unwanted isomer cannot be 
found, there would be a waste of 50% of mate¬ 
rial. Therefore, it can often be advantageous 
to conduct the separation at an earlier stage in 
the synthesis of the drug. This leads to better 
atom efficiency compared with resolution of 
the final product, resulting in a reduction of 
the overall amount of waste and cost. 

One such example is Verapamil, which is a 
well-established treatment of cardiovascular 
ailments (77). <S-(-)-Verapamil (51) has spe¬ 
cific transmembrane calcium channel antago¬ 
nist activity, whereas its antipode (53) influ¬ 
ences a wider range of cell pump actions, 
including those for sodium ions (78). Vera¬ 
pamil has been separated into its single enan¬ 
tiomers by resolution with expensive resolving 
agents, which required multiple recrystalliza¬ 
tions to effect complete separation (79). Look¬ 
ing into the synthetic sequence of Verapamil, 
several intermediates seemed to be attractive 
alternatives to Verapamil (80). The intermedi¬ 
ate verapamilic acid (Fig. 18.19) was effi¬ 
ciently separated using a-methylbenzylamine 
(a-MB A), which is an extremely cheap resolv¬ 
ing agent (81). Subsequent transformation of 
the easily obtained R- or S-verapamilic acid 
(50 or 52), required a further three to four 
synthetic steps to yield the active pharmaceu¬ 
tical ingredient. 


The racemate aminoglutethimide (27) has 

been shown to be effective in the treatment of 
hormone-dependent breast cancer (Fig. 
18.20). Further studies have shown that the 
i?-enantiomer is more potent than its antipode 
as an aromatase inhibitor (82).The resolution 
of aminoglutethimide itself has been reported 
in the literature, using tartaric acid. This res¬ 
olution suffers from the formation of solid so¬ 
lutions (83), which require endless crystalliza¬ 
tions to deliver the single enantiomer (84). 
Use of a suitable precursor (54) enabled sepa¬ 
ration of the intermediate (55), by treatment 
with the alkaloid resolving agent (— )-cincho- 
nidine. This chiral acid was then cyclized to 
nitroglutethimide, which on reduction, gave 
the desired R-aminoglutethimide (56) (85). It 
is noteworthy that in the case of aminoglute¬ 
thimide, the amine functionality is an aniline 
moiety. Because of the low pK & associated with 
this amine (2.5-4.6), the number of acidic re¬ 
solving agents that can be employed are re¬ 
duced, because they need to be of relatively 
high acidity to form a salt. 

3.3 Crystallization-Induced Asymmetric 
Transformation 

A number of amino acids have been separated 
by resolution, in certain cases the yield of the 
required diastereoisomer has been greater 
than 50% (86). p-Chlorophenylalanine is of 
considerable pharmacological interest, be¬ 
cause of its ability to inhibit serotonin forma- 
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tion in laboratory animals (87). Both the R- 
and S-enantiomers have also been used as 
building blocks in the synthesis of other drugs. 
An ingenious approach to R-p-chlorophe- 
nylalanine methyl ester, which is based on a 
one-pot resolution-racemization sequence, is 
highlighted in Fig. 18.21. Here, treatment of 
racemic p-chlorophenylalanine methyl ester 
(57) with 0.5 eq of D-tartaric acid and 0.1 eq of 
salicylaldehyde in methanol gave a 68% yield 
of 98% enantiomeric purity of the 2:1 R-p- 
chlorophenylalanine D-tartaric acid salt (58). 
The reason that the absolute yield is greater 
than 50%is caused by the S-enantiomer being 
racemized in situ. The 2:1 tartrate salt is crys¬ 
talline and is therefore removed from the sys¬ 
tem by virtue of its insolubility. This drives 
the equilibrium further in favor of the 2:1 R-p- 
chlorophenylalanine D-tartrate salt (88). 

While the common goal remains to be the 
rational design of resolving agents (89), it is 
clear that we are still away from this actually 
happening. An alternative "family" approach 
to classical resolution has been demonstrated 
by Vries et al. (90). A group of similar resolv¬ 
ing agents are mixed simultaneously with the 
racemate. This was done to shorten the time 
required to complete the resolving agent 
screen. Note should be made that the families 
of resolving agents are very similar and that 
the crystalline species obtained by this 
method contained more than one of the resolv¬ 


ing agents. As with all screens, analysis of the 
data is often time consuming and laborious. 
Bruggink et al. have shown that differential 
scanning calorimetry (DSC) of the isolated 
salts can help to quickly determine whether 
the isolated salt will provide a through resolu¬ 
tion (91).However, with a methodical and pre¬ 
cise screening protocol, it is nearly always pos¬ 
sible to find a suitable resolving agent that 
effects separation of the enantiomers (92). 

4 NONCLASSICAL RESOLUTION 

4.1 Preferential Crystallization 

A brief description of the type of "racemic" 
compounds is necessary for the reader to bet¬ 
ter understand the principles behind the ap¬ 
plication of crystallization methods to the sep¬ 
aration of enantiomers. Three fundamental 
types of crystalline racemates exist. In the 
first, the crystalline racemate is a conglomer¬ 
ate, which exists as a mechanical mixture of 
crystals of two pure enantiomers. The second, 
which is the most common, consists of the two 
enantiomers in equal proportions in a well- 
defined arrangement within the crystal lat¬ 
tice; this is termed racemic compound. The 
third possibility occurs with the formation of a 
solid solution between the two enantiomers 
that coexist in an unordered manner in the 
crystal. This kind of racemate is called a pseu- 
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doracemate and is rather rare. Conglomerates 
have been estimated to be approximately 10% 
of all racemates (93). Diagrammatic represen¬ 
tation of the first two types of racemate are 
shown in Fig. 18.22. 

By understanding the appropriate phase 
diagrams, which describe the melting behav- 
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Figure 18.22. 


ior of the two enantiomers (binary melting 
point phase diagram) or their solubility behav¬ 
ior in the presence of a solvent (ternary solu¬ 
bility phase diagram), separation of enanti¬ 
omers can be reproduced. Phase diagrams for 
the three types of racemate are shown in Fig. 
18.23. For a full and detailed explanation of 
this topic refer to the monograph of Jacques et 
al. (57). 

4.2 Enrichment of Enantiomeric Excess by 
Crystallization 

The attainment of high levels of enantiopurity 
is not always possible by enzymatic or diaste- 
reomeric resolutions or by asymmetric syn¬ 
theses alone. It is however frequently possible 
to prepare a pure enantiomer from a partially 
resolved sample by simple recrystallization. 
For this process to proceed successfully it is 
necessary that the initial enantiopurity of the 
mixture is greater than that of the eutectic 
point in the phase diagram. By utilization of 
the phase diagram, the optimal quantity of sol¬ 
vent required can be calculated. It is also pos¬ 
sible to calculate the maximum expected yield. 



4 Nonclassical Resolution 


801 






Figure 18.23. 

Note should also be made that in some cases 
recrystallization reduces the enantiomeric ex¬ 
cess, which can lead to crystallization of the 
racemate (94). In these cases the mother li¬ 
quors contain moderately to highly enriched 
material. It is therefore important to plan the 
strategy at which point the enantiomer is re¬ 
crystallized to optical purity. This may be 
from an enzymic resolution, or in the event 
that an asymmetric synthesis has failed, to de¬ 
liver enantiopure product. As discussed in Sec¬ 
tion 3, the liquors from the diastereomeric res¬ 
olution with DTTA of 88%de can be cleaved to 
the free base, and crystallization of the hydro¬ 
chloride salt gives >98% ee. This is because of 
the fact that methylphenidate hydrochloride 
has a eutectic point of 30% ee. Davies et al. (95) 
and Winkler et al. (96) have prepared single 
enantiomer methylphenidate (29). Their ap¬ 
proaches use an enantioselective synthesis; 
the enantiomeric excesses are 86% and 69%, 
respectively, thus requiring recrystallization 


to deliver enantiopure product. Another ex¬ 
ample of this type of compound is Warfarin 
(13).Chemists at Dupont (97) developed an 
asymmetric hydrogenation approach, which 
gave Warfarin in ~80% ee. Simple crystalliza¬ 
tion in an appropriate solvent yielded optically 
pure Warfarin, thus indicating that the eutec¬ 
tic point is below 80% ee. (See earlier section 
on the metabolism and binding properties of 
the Warfarin enantiomers). 

The phase diagrams below highlight two 
typical cases, the first where the eutectic point 
E is close to the racemate, and the second 
where the eutectic approaches the single en¬ 
antiomer as shown in Fig. 18.24. In the first 
case, it would be preferable to crystallize the 
enriched enantiomer to optical purity, e.g., 
methylphenidate. However, in the second 
case, a very stable racemic compound exists, 
giving rise to a high eutectic point. Here crys¬ 
tallization of enriched enantiomer mixture 
will only be successful at high ee. For example, 
verapamil hydrochloride requires that the ee 
be greater than 98% for crystallization to yield 
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enantiopure product. Below this, the enantio- 
purity is reduced. In this case, it is advanta¬ 
geous to recrystallize the diastereomeric salt 
precursor to optical purity before proceeding 
to final product. 

4.3 Resolution by Direct Crystallization 

It is important to show how conglomerates are 
identified. We have already seen that they 
have specific phase diagrams as shown in Fig. 
18.23. Other such data that support identifi¬ 
cation of a conglomerate are IR, X-ray data, 
and observation of a spontaneous resolution 
or resolution by entrainment. Note should be 
made that in 1848, Louis Pasteur separated 
the dextrorotatory and levorotatory crystals of 
sodium ammonium tartrate. This manual 
sorting of crystals is also known as triage, and 
by its very nature is time consuming and labo¬ 
rious. The readers are again directed towards 
the Jaques et al. monograph, which lists over 
250 known examples of conglomerates (57). 
There are two possibilities for separation of 
enantiomers by direct crystallization. The 
first uses spontaneous resolution, which oc¬ 
curs when a conglomerate crystallizes. This 
crystallization may be followed by the me¬ 
chanical separation of the crystals of the two 
enantiomers. Various techniques have been 
developed that aid this separation. 

The second type of resolution by direct 
crystallization is known as entrainment. 
Here, the differences in the rate of crystalliza¬ 
tion of the enantiomers in a supersaturated 
solution give rise to a separation. Strict con¬ 
trol of the conditions for the crystallization are 
required, with the system of crystals and solu¬ 
tion not being allowed to come to equilibrium 
and time playing an important role. The oc¬ 
currence of conglomerates has been estimated 
to be approximately 10% of all racemic com¬ 
pounds. We will now illustrate this phenome¬ 
non with some pertinent examples. 

An example of use of the conglomerate Nar- 
wedine (59) in the synthesis of a natural prod¬ 
uct Galanthamine (61) which is an Amarylli- 
daceae alkaloid and has been used clinically 
for 30 years for neurological illnesses (98). 
More recently it has been approved for the use 
in the treatment of Alzheimer's disease (AD) 
(99). Galanthamine acts to inhibit acetylcho¬ 
linesterase (AChE), thus increasing the levels 


of acetylcholine. An increase in the level of 
acetylcholine in patients with AD has been 
shown to improve their cognitive perfor¬ 
mance. Galanthamine has been extracted 
from botanical sources; however, several tons 
of daffodil bulbs are needed to produce 1 kg of 
product. A synthetic route has been developed 
that uses a crystallization-induced chiral 
transformation (Fig. 18.25). This crystalliza¬ 
tion was first reported by Barton and Kirby 
(100) and further developed by Shieh and 
Carlson (101). The success of this transforma¬ 
tion is based on two phenomena: narwedine 
(59), which crystallizes as a conglomerate, and 
(-)-narwedine (60), which equilibrates with 
(-l-)-narwedine through a retro-Michael inter¬ 
mediate. This process has now been developed 
so that (-)-narwedine (60) is routinely ob¬ 
tained in 80% yield from the racemate input, 
as shown in Fig. 18.25 (102). 

Recently a number of potent 5-HT, recep¬ 
tor antagonists such as Ondansetron have 
been reported to be clinically effective for the 
blockade of chemotherapy-induced nausea 
and emesis (103). The structurally novel com¬ 
pound (62) has also been shown to be a highly 
potent 5-HT, antagonist (104); specifically, 
the R-(-)-(62) enantiomer was shown to be 
the most active. Comparison of the physical 
data of the racemate and single enantiomer 
indicated that this structure (62) exists as a 
conglomerate (104). By careful experimenta¬ 
tion, the best concentration, temperature, and 
time for crystallization were discovered. Table 
18.1 highlights the results obtained for the en¬ 
trainment. 

The initial concentration of the solution 
was 10.0 g of (±)-(62) in 50 g of acetone. In all 
runs, 10 mg of seed crystals were used. From 
the 10 runs highlighted in the 18.1, 21.0 g of 
i?-(-)-(62) of >92.0% ee and 21.4 g of (£)-(+)- 
(62) of >90% ee are obtained from an input of 
50.4 g of racemate. The table also nicely illus¬ 
trates the continuous nature of the process, 
which coupled with the fact that no resolving 
agent, chiral auxiliary, enzyme, or catalyst is 
needed, underlines the economic advantages 
of this type of process. 

The importance of amino acids as building 
blocks for asymmetric synthesis is well docu¬ 
mented (105). A number of amino acids have 
been shown to exist as conglomerates. Shi- 
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Figure 18.25. 


raiwa et al. have described the preferential 
crystallization of racemic methionine hydro¬ 
chloride (106). The obtained d- or L-methio- 
nine hydrochloride was, however, only —75% 
optically pure, requiring a further recrystalli- 
zation to furnish enantiopure product. Shi- 
raiwa et al. have also recently disclosed the 
resolution of (2i?S, 3&R)-2-amino-3-chlorobu- 
tanoic acid HC1 again using entrainment 
(107). Here it was shown to be necessary to 
conduct the crystallization in an ethanol 15 M 
hydrochloric acid solvent mixture for optimal 
results. By careful control of the conditions, 
high levels of enantiomeric excess were ob¬ 
tained in the crystalline salt. 

Chemists in Japan have developed an excel¬ 
lent approach to (+)-Diltiazem, which is a 
coronary vasodilator (108). An intermediate 


CH 3 



(RH-M62) 
Figure 18.26. 


is successfully resolved using preferential 
crystallization. The glycidic acid-substituted 
phenylesters were prepared; of the 30 synthe¬ 
sized, only one exhibited conglomerate prop¬ 
erties (109). This was the 3-(4-methoxyphe- 
nyl) glycidic acid 4-chloro-3-methylphenyl ester 
(63). Table 18.2 summarizes the physical data 
collected, which is illustrative of the conglom¬ 
erate nature of this compound. 

The obtained single enantiomer (-)-epox- 
ide (64) is then converted into the required 
(+)-isomer of Diltiazem (65) in several steps, 
as highlighted in Fig. 18.27. 

Taxol is a natural product isolated in very 
low yield from Taxus brevifolia and is used in 
the treatment of cancer (110). The extreme 
chemical complexity of Taxol makes produc¬ 
tion by total synthesis uneconomical. How¬ 
ever, a semisynthetic approach using the nat¬ 
urally derived 10-deacetylbaccatin III (66) 
condensation with AT-benzoyl-(2 J R, 3S)-3-phe- 
nylisoserine (67) does provide an alternative 
and economic approach (111). N-benzoyl-(2R, 
3jS)-3-phenylisoserine (67) is also commonly 
known as the Taxol side-chain and has been 
prepared in optically active form using chiral 
auxiliaries or resolving agents (112). It has 
been shown that the Taxol side-chain is a con¬ 
glomerate and can therefore be cheaply and 
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Table 18.1 Resolution of (62) by Preferential Crystallization 


Run 

(±) 

Added (g) 

Seed 

Time 

(minutes) 

EE of 
Solution 
(%EE) 

Amount of 
Crystals (g) 

Rotation 

%EEof 

Solid 

1 

— 

(RM ~ ) 

220 

22.7/+ 

2.0 

— 

95 

2 

4.0 

GSM + ) 

210 

19.4/- 

4.6 

+ 

92 

3 

4.6 

CRM - ) 

190 

21.2/+ 

3.8 

— 

95 

4 

3.8 

GSM + ) 

205 

21.0/- 

3.9 

+ 

95 

5 

3.9 

(RM - ) 

210 

21.5/+ 

3.8 

— 

95 

6 

3.8 

GSM + ) 

220 

16.4/- 

5.0 

+ 

90 

7 

5.0 

(RM - ) 

200 

21.3/+ 

3.4 

— 

92 

8 

3.4 

GSM +) 

215 

21.2/- 

3.9 

+ 

94 

9 

3.9 

(RM - ) 

190 

22.2/+ 

4.0 

— 

93 

10 

4.0 

(SM + ) 

220 

21.4/- 

4.0 

+ 

95 

11 

4.0 

(RM - ) 

200 

20.8/+ 

4.0 

— 

95 


Reprinted from H. Harada, Tetrahedron Asymmetry, vol. 8, T. Morie, Y. Hirokawa, and S. Kato, 1997, pp. 2367-2374, 
Reproduced with permission from Elsevier Science. 


efficiently entrained to the single required en¬ 
antiomer (113). 

5 ENZYME-MEDIATED ASYMMETRIC 
SYNTHESIS 

Enzymes have found frequent use in the syn¬ 
thesis of single isomer drugs from racemic or 
prochiral compounds at the larger manufac¬ 
turing scales. The use of enzymes to effect 
chiral transformations in the medicinal chem¬ 
istry laboratory has been far less frequent; 
however, the increasing availability of immo¬ 
bilized and stabilized forms of enzymes has 
made their use easier and the resultant trans¬ 
formations more predictable. 

By virtue of their complex macromolecular 
structure, including a highly defined active 
site, enzymatic transformations generally 
proceed with a high degree of chemical selec¬ 
tivity and stereospecificity. Reactions are typ¬ 
ically conducted under mild conditions of tem¬ 
perature, pressure, and pH, thus minimizing 
losses caused by unwanted side reactions or 
partial racemization. The use of extremo- 
philes or cross-linked enzymes such as CLECs 


do enable the use of higher temperatures, 
pressures, and organic solvents. 

Enzymes can be utilized to affect a number 
of transformations; the broad spectrum of re¬ 
actions, including amide bond formation, hy¬ 
drolysis, esterification, reduction, oxidation, 
and carbon-carbon bond formation, has been 
reviewed elsewhere (114). 

5.1 Amide Bond Formation 

The use of enzymes to stereospecifically form 
amide bonds has been described in many texts 
(115); however, the commercial availability cf 
cross-linked enzyme crystals (CLECs), for ex¬ 
ample, PeptiCLEC-TR, which is an immobi¬ 
lized form of Thermolysin protease, has been 
used in the synthesis of D2163 (68), a novel 
matrix metalloproteinase inhibitor (116). In 
vitro enzyme screening identified the all-nat¬ 
ural SSS-isomer as the active product. The 
elegant CLEC (117) technology used in this 
example makes the enzyme stable to typical 
organic reaction conditions and enables facile 
removal of the enzyme at the end of the reac¬ 
tion by simple filtration. On this basis, it is 


Table 18.2 Properties of (63) Indicating Conglomerate Nature 


Compound 

MP (°C) 

Solubility 
(g/100 mL) THF 

Solubility 
(g/100 mL) DMF 

IR Spectrum 

(±)-4.2 

123-124 

14.0 

13.0 

Identical 

(-)-4.2 

139-141 

6.7 

6.9 

Identical 
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Figure 18.27. 
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anticipated that medicinal chemists will more 
commonly use these enzymes in the future. 

The coupling of dipeptide (69) to the pro¬ 
tected a-thio carboxylic acid (70) was con¬ 
ducted in organic solvent at high concentra¬ 
tion with the desired product produced in a 
few hours with high enantiospecificity. 

5.2 Transesterification and Hydrolysis 

A widely used technique for separating race¬ 
mic mixture is the use of enzyme mediated 
transesterification or hydrolysis. One impor¬ 
tant example is the separation of Naproxen 
(33), which is a member of the 2-arylpropionic 
acid class of profens that are broadly used as 
NSAIDs (see Section 2 for the separation of 
enantiomers using a crystallization ap¬ 
proach). The important association between 
chirality and biological activity of this class of 
drugs has been extensively researched, where 


the role of cyclooxygenase-independent prop¬ 
erties of theR-enantiomers in the gastrointes¬ 
tinal toxicity of the racemates and the likeli¬ 
hood that the use of racemates increases the 
propensity of profens to alter the pharmacoki¬ 
netics of other drugs has been described (118). 

Whereas not all profens are sold as single 
isomers, Naproxen is sold as the single S- 
enantiomer (36) where various strategies in¬ 
cluding crystallization, chromatographic sep¬ 
aration, asymmetric hydrogenation and enzy¬ 
matic hydrolysis, and esterification have been 
used to prepare the single isomer (65). Specific 
examples include the use of Candida cylindra- 
cea lipase to enantioselectively prepare single 
isomer naproxen ester with trimethyl silyl 
methanol (119) and the use of Candida rugosa 
lipase in an enantioselective continuous hy¬ 
drolysis of Naproxen methyl ester (120). 

Pipecolic acid is a component of a number 
of active drugs, including bupivacaine (38) 
and thioridazine (72) (Fig. 18.30), which has 
been efficiently resolved as the racemic n-octyl 
pipecolate with Aspergillus niger. The S-iso- 
mer is obtained as the free acid in a 40% yield 
based on the available enantiomer with a 97% 
ee (121). 

Propanolol (14) is a broadly used jS-adren- 
ergic receptor blocking agent that is sold as 
the racemate. However, the majority of the 
activity is associated with the S-enantiomer 
(74) (see Section 2) (122). The asymmetric 
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synthesis of the desired S-enantiomer has 
been achieved by the selective acylation of the 
R-enantiomer of the key intermediate (73) as 
shown in Fie. 18.30. 




Thioridazine (72) 



Bupivacaine (38) 


Figure 18.30. 


5.3 Oxidation and Reduction 

In addition to the widely reported techniques cf 
amide bond formation, transesterification, and hy¬ 
drolysis, enzymic enantioselective oxidation is also 
used in the synthesis cf single isomer drugs. Patel 
described the efficient oxidation of benzopyran 
(75), an intermediate in the synthesis of potassium 
channel openers (123).The transformation wasef- 
fected w i t h a cell suspension of Mortierella raman- 
niana with glucose over a 48-h period, the isolated 
product (77) was obtained in a 76%yield with an 
optical purity of 97% and a chemical purity of 98%, 
as shown in Fig. 18.32. 

Reduction with a variety of enzymes has 
been reported (114), including bakers yeast for 
the reduction of a-methyleneketonesto the cor¬ 
responding a-methylalcohol (124), a functional¬ 
ity that is present in a number of drugs. The 
reduction of an azidoketone (78) using Pichia 
angusta enzyme has been used in the synthesis 
of S-salmeterol (79) (125). Salmeterol (Ser- 
event) is a potent, long-acting /32-adrenorecep¬ 
tor used as a bronchodilitor in the treatment of 
asthma. Recently, Sepracor claimed that the S- 
enantiomer had a higher selectivity for j32 recep¬ 
tors and that it did not cause certain adverse 
effects associated with the administration of 
(±)- or (R)-salmeterol (126). The synthesis of 
(S)-salmeterol (79) is shown in Fig. 18.33. 
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6 ASYMMETRIC SYNTHESIS 

Synthetic organic chemists have a vast array 
of tools at their disposal when faced with the 
challenge of preparing a chiral compound as a 
single enantiomer. The purpose of this section 
is to introduce the reader to some asymmetric 
approaches toward chiral drugs and medicinal 
compounds, highlighting examples where the 
stereoisomers behave differently in biological 
systems. There are many excellent books and 
reviews covering methods for asymmetric syn¬ 
thesis and their application to the preparation 
of pharmaceutical agents and complex natural 
products (127). 

6.1 Chiral Pool 

The use of enantiopure starting materials 
from nature in the synthesis of chiral drugs is 
not only of great historical significance but re¬ 
mains of critical importance to the pharma¬ 
ceutical industry. Consideration of the cur¬ 
rent biggest-selling single enantiomer drugs 
shows how important this approach is (8of the 
top 10 in 1996 were obtained from chiral pool 



(77) 


starting materials or by synthetic manipula¬ 
tion of fermentation products) (127). The 
optically pure starting materials that have 
been used in drug synthesis include amino ac¬ 
ids, hydroxy acids, terpenes, alkaloids, carbo¬ 
hydrates, and many more structurally diverse 
compounds. There are many syntheses involv¬ 
ing clever manipulation of chiral pool starting 
materials and use of these chiral centers to 
induce further asymmetry (i.e., by diastereos- 
elective reactions). We will briefly consider 
some examples in which all or most of the 
chiral centers in the target molecule originate 
directly from nature. 

Angiotensin-convertingenzymes (ACE) in¬ 
hibitors are used mainly for the treatment of 
cardiovascular disorders and are among the 
biggest selling drugs worldwide (128). Enala- 
pril (80) is synthesized from the natural 
amino acids L-alanine and L-proline (129). Lis- 
inopril (81)incorporates a lysine derivative 
(130). One of the chiral centers in Captopril 
(82) is derived from proline, but the other is 
generated by chemical or enzymatic resolu¬ 
tion (131).Cilazapril (83) is a conformation- 
ally restricted second generation ACE inhib¬ 
itor developed by Roche, and the core is 
synthesized from a glutamic acid derivative 
and an amino acid-derived pyridazine (128, 
132). 

There are many other examples of drugs 
based on an amino acid backbone. Stoner et al. 
recently reported a synthesis of the HIV pro¬ 
tease inhibitor ABT-378 (Lopinavir) (84) 
(133). In a similar synthesis to that of the re¬ 
lated compound, Ritonavir, key intermediate 
(85)is prepared by stepwise diastereoselective 
reduction of enaminone (86).This means that 
the existing chiral center, derived from natu¬ 
ral L-phenylalanine (protected to 87), controls 
the formation of the two new stereocenters as 


Figure 18.32. 
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AcOH, water 



(S)-Salmeterol (79) 


Figure 18.33. 


discussed for chiral auxiliaries below. Two 
acylations then complete the synthesis, with 
the final chiral center clearly derived from L- 
valine. 

The stereospecificity of binding at the his¬ 
tamine H 3 -receptor was investigated by pre¬ 
paring a series of ligands from d- or L-histidine 
(88)(134). It was found that compounds such 
as (SM89) had greater affinity for the receptor 
than their R-enantiomers. In addition, replac¬ 
ing the aromatic moiety with a cyclohexyl 
group (e.g., 90) switched the activity to ago- 
nism for compounds with an amino group in 
the chain. 

Hydroxy acids are important chiral start¬ 
ing materials in the synthesis of many biolog¬ 
ically active compounds (135). (<S)-3-Hydroxy- 
y-butyrolactone (91)is a very useful synthetic 


unit available from D-pyranoses (136). Work¬ 
ers at Schering-Plough used this as the key 
starting material in a concise synthesis of Sch 
57939 (92), a j3-lactam-based cholesterol ab¬ 
sorption inhibitor (137).The condensation be¬ 
tween the dianion of (S)-3-hydroxy-y-butyro- 
lactone and an appropriate diaryl imine 
proceeded with very high diastereo- and enan- 
tioselectivity, generating azetidinone (93) 
with a transxis ratio of >95:5. 

Researchers at Abbott have been investi¬ 
gating the use of pyrrolidinylisoxazoles as nic¬ 
otinic cholinergic channel activators (138). 
Until recently, ABT-418 (97) was undergoing 
clinical trials as a potential treatment for cog¬ 
nitive impairment and decline and for Alzhei¬ 
mer's disease. A short synthesis of ABT-418 
was devised starting from commercially avail- 
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Cilazapril (83) 
Figure 18.34. 
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<S)-(89) (pK\ H 3 = 7.1) 
c.f. (fi)-(89) (pKj H 3 = 6.7) 



(S)-(90) (pKj H 3 = 7.9) 
c.f.(fl)-(90) (pKj H 3 = 6.8) 

Figure 18.36. 

able (S)-pyroglutamic acid methyl ester (94) 
(139). Acetone oxime dianion was added to the 
methyl ester (94) to generate an intermediate 
(95). Racemization of the chiral center was 
found to occur under basic conditions; how¬ 
ever, this was avoided by immediate treat- 



Lopinaw (87) (86)(89-93%) 


Figure 18.35. 
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ment with concentrated sulfuric acid resulting 
in cyclization and dehydration to amide isox~ 

azole (96). Reduction and IV-methylation 
yielded ABT-418 (97). The binding affinity of 
ABT-418 at neuronal cholinergic channel re¬ 
ceptors was measured to be one order of mag¬ 
nitude greater than the corresponding R-en- 
antiomer (K i = 4.2 versus 44 nM) (138). 


6.2 Chiral Auxiliary 

In this approach the substrate is attached to a 
chiral, non-racemic unit that controls the for¬ 
mation of one or more new chiral groups. Re¬ 
action of the coupled unit with a reagent or 
prochiral substrate is designed to produce one 
diastereomeric product in excess. The auxil- 



ABT-418 (97) (ee >99%) (96) 


Figure 18.38. 
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Figure IS.39. 


iary is then removed (and preferably recov¬ 
ered), providing the product in high enantio¬ 
meric excess. This process is most attractive 
when both isomers of the auxiliary are readily 
available in enantiomeric ally pure form, and 
where the reaction leads to high levels of ste¬ 
reoselectivity in a predictable manner. Attach¬ 
ing and removing the auxiliary should be 
straightforward and proceed without loss of 
stereochemical integrity. 

Many auxiliaries currently in use are de¬ 
rived from 1,2-amino alcohols (140). These are 
readily available from natural sources with lit¬ 
tle or no synthetic manipulation and can react 
in a variety of ways to form conformationally 
well-defined (usually cyclic) auxiliary systems. 
The use of oxazolidinones in asymmetric syn¬ 
thesis was developed by Evans et al., and these 
oxazolidinones have been used extensivelvin a 


variety of different reactions (140, 141). The 
use of this chiral auxiliary in the preparation 
of pharmaceuticals is widespread, and there 
are several large-scale processes using such 
chemistry (142). 

Abbott reported an improved synthesis of 
ABT-627 (98)involvingan asymmetric alkyla¬ 
tion of the valine-derived acyl oxazolidinone 
(99) (143) . ABT-627 (Atrasentan) is a selective 
endothelin ET A receptor antagonist under de¬ 
velopment for the treatment of cancer, partic¬ 
ularly prostrate cancer. Acid (100) was acti¬ 
vated as a mixed anhydride and treated with 
the lithium anion of the oxazolidinone to give 
(lOl).Both of the following deprotonation and 
alkylation steps must be controlled to give 
high levels of stereoselectivity. The (Z)-eno- 
late (102) is favored, both kinetically and ther¬ 
modynamically, by the bulky iso-propyl group 
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Table 18.3 Stereochemical Variation 


(3a, 6) 

K { (n M versus HIV-1) 

IC„ (pM, cell HIV-1) 

R, R Tipranavir 

0.008 

0.03 

R, S 

0.018 

0.14 

S, R 

0.032 

0.41 

S , S 

0.22 

1.7 


and is held rigid by chelation to the carbonyl 
oxygen of the oxazolidinone. The major stereo¬ 
isomer then results from alkylation of this 
chelated enolate anion from the least hindered 
"upper" face to yield (103) as the major prod¬ 
uct. There are many strategies for removal 
and recovery of an oxazolidinone auxiliary 
(141). In this case, hydrolysis with lithium 
peroxide provides the acid that is transformed 
into Atrasentan through a cyclization-ring 
contraction strategy controlled by the chiral¬ 
ity present in (103). 

Tipranavir (PNU- 140690)is a potent third- 
generation HIV protease inhibitor in clinical 
development by Boehringer Ingelheim (under 
license from Pharmacia). The biological activ¬ 
ity of such 5,6-dihydro-4-hydroxy-2-pyrone 
sulfonamides shows considerable stereochem¬ 
ical variation (Table 18.3) (144). The jR-config- 
uration is preferred at both chiral centers (3a 
and 6), and Tripanavir is more than 50 times 
as potent as its enantiomer in a cell culture 
assay using HIV-l IIIB -infected H9 cells. An 
asymmetric synthesis (145) begins with the 
Michael addition of an aryl cuprate (derived 
from commercially available Grignard reagent 
105) to the unsaturated oxazolidinone imide 
(104), yielding the adduct as a single diaste- 
reomer (106). The nitrogen protecting group 
was changed and an acetyl group introduced 
to give ketone (107), which undergoes a 
stereoselective aldol reaction with an acety¬ 
lenic ketone (108).The highest diastereoselec- 
tivity was obtained for this reaction using 
Ti(0 n Bu)Cl 3 as the Lewis acid. Both of the 
critical asymmetric steps to form new chiral 
centers are controlled by the CRJ-phenyl ox¬ 
azolidinone. The chiral auxiliary is removed 
when (109) is treated with base to form the 
lactone ring. This is followed by two further 
steps that generate PNU-140690 (110) as a 
single enantiomer. 

The enantioselective synthesis of dopa¬ 
minergic benzyltetrahydroisoquinolines and 


their binding to D A and D 2 dopamine receptors 
was investigated by Cabedo et al. (146). The 
synthetic route, illustrated by the preparation 
of the (ls)-isomer involves stereoselective re¬ 
duction of the isoquinolinium salt (114) with 
CR)-phenylglycinol (introduced in protected 
form as 112) as the chiral auxiliary. The (1 R)- 
enantiomer of (U5), prepared in an analogous 
fashion using (S) -phenylglycinol, binds to do¬ 
pamine receptors with considerably less affin¬ 
ity (>100 fxM versus D 2 and 61.2 }iM versus 
D 2 ). In contrast, stereochemical differentia¬ 
tion was not observed at the dopamine uptake 
site for these compounds. 

Two different chiral auxiliary approaches 
have been applied to the synthesis of NPS 
1407 and it's enantiomer (119) (147). NPS 
1407 is an antagonist of the glutamate NMDA 
receptor that has in vivo activity in neuropro¬ 
tection and anti-convulsant assays. The R-en- 
antiomer was synthesized in four steps from 
(116) with the chiral center introduced by. a 
completely stereoselective alkylation of hydra- 
zone (117). The chiral auxiliary, £-(— )-l-ami- 
no-2-(methoxylmethyl)pyrrolidine (SAMP), 
was introduced by condensation with alde¬ 
hyde (116) and removed by catalytic hydro- 
genolysis. In the second method, the S-enan- 
tiomer was formed in a four-step sequence 
with the chiral center installed by the Michael 
addition of chiral amine (121) (formed in one 
step from the readily available a-methylben- 
zylamine) to benzyl crotonate (120). NPS 
1407 (123) was found to be 12 times more po¬ 
tent than it's enantiomer (119) at the NMDA 
receptor in an in vitro assay. 

An example of the use of a terpene as a 
chiral auxiliary is provided by the synthesis 
of the anti-viral reverse transcriptase inhib¬ 
itor Lamivudine (148). The nucleoside ana¬ 
log is marketed by Biochem Pharma (now 
Shire Pharmaceuticals) and Glaxo Wellcome 
(now GlaxoSmithKline) for the treatment of 
HIV and hepatitis B virus infection. In the 
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Figure 18.40. 



production route, the glycolate derived from 
(- )-menthol( 124) is coupled with thioacetyl 
dimer (125). The chiral auxiliary directs reac¬ 
tion to install the desired (2i?)-stereochemis- 
try in (126).In situ formation of chloro com¬ 
pound (127) is followed by a stereoselective 
coupling reaction with trimethylsilyl cytosine 
again directed by the (-)-menthyl carboxy- 
late. Reductive removal of the auxiliary yields 
Lamivudine (129) as a single isomer that was 
found to have favorable toxicological and 
pharmacokinetic properties to the racemate. 

6.3 Chiral Reagent 

In this approach, asymmetry is induced in a 
prochiral molecule or functional group by re¬ 
action with a stoichiometric amount of an en- 


antio-enriched reagent system. The reaction 
proceeds through diastereomeric transition 
states, resulting in the preferential formation 
of one enantiomer or diastereomer. Current 
reagents can lack generality and may be diffi¬ 
cult to prepare in both chiral forms. At least 
one equivalent of the chiral component is re¬ 
quired, which can present economical and 
practical difficulties. Many examples are pro¬ 
vided by the reduction of double bonds, espe¬ 
cially ketones. Ketone (130) was reduced 
enantioselectively using either (+) or (-)-&- 
chlorodiisopinocampheylborane (149). Re¬ 
duction with (-)-b-chlorodiisopinocampheyl- 
borane generated the alcohol (S)-(131), which 
was transformed into the ( 1R,3S)~isochroman 
compound, (FR,3<S)-(132), through a ste- 
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(1 S)-(115) (16.6 (iM vsD : 
14.7 pM vs D 2 ) 



(114) (78%de) 


Figure 18.41. 


reospecific cyclization to form the cis stereo¬ 
chemistry across the ring. The enantiomer, 
(lS,3i?)-(132), was prepared in a directly anal¬ 
ogous manner by reduction with (+)-b-chloro- 
diisopinocampheylborane. This study repre¬ 
sents another example of stereodifferentiation 
at the dopamine receptors, with nearly a 5000- 
fold difference in D : potency observed be¬ 
tween the two isomers in an in vitro assay. 

A recent paper by scientists at Bristol-My¬ 
ers Squibb reports the synthesis of a new class 
of calcium-activated potassium channel mod¬ 
ulators (150). The compounds were investi¬ 
gated for their ability to increase channel 
opening at large conductance (BK or maxi-K) 
channels and showed a limited degree of ste¬ 
reospecificity. The key step in the synthesis 
is the direct oxidation of the enolate derived 
from (133) with either isomer of cam- 
phorsulfonyl oxaziridine, a reagent devel¬ 
oped by Davis (151). Both enantiomers of 
the 3-aryl-3-hydroxyindol-2-ones were pre¬ 
pared with very high enantiopurity (>95% 
ee) using opposite enantiomers of the chiral 
oxaziridine. (— )-(134) was found to be a bet¬ 
ter activator of a cloned BK channel than the 
(+)-isomer at 20 j uJM, generating a current 
increase of 141% compared with 124% for 
(+M134). 


6.4 Chiral Catalyst 

The use of a chiral catalyst represents the 
ideal method for asymmetric synthesis be¬ 
cause only small amounts of the chiral media¬ 
tor are required and no modifications of the 
prochiral substrate are necessary. In many 
systems both enantiomers of the product cgn 
be prepared in a predictable and reproducible 
manner. The pharmaceutical industry is par¬ 
ticularly interested in the capability of new 
catalyst systems to operate as reliable manu¬ 
facturing processes on a large scale (127,152). 
Substantial effort continues to be expended by 
the synthetic organic community with the goal 
of extending the number of efficient and 
broadly applicable catalyst systems capable of 
generating high levels of enantiomeric excess 
in a wide range of substrates (127). 

The reduction of ketones by borane cata¬ 
lyzed by chiral oxazaborolidines such as 
(136), derived from the enantiomeric amino 
alcohols, has been applied to the synthesis of 
several drug candidates (127). This system is 
known as the CBS (Corey, Bakshi, Shibata) 
reduction (153), and Corey himself has ap¬ 
plied it to the synthesis of pharmaceutical 
compounds (154). A further example is pro¬ 
vided by the synthesis of MK-499 (137), a 
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NPS 1 407 (1 23) (IC 50 = 0.089|J.M) (122) 


Figure 18.42. 

potassium channel blocker that was devel- itor, was in clinical development for the treat- 
oped for the treatment of cardiac arrhyth- ment of hypertension and congestive heart fail- 
mi a by Merck (155). ure, and its enantiomer does not possess the 

Asymmetric hydrogenations with transition same biological activity. Several catalysts and 

metal catalysts have been applied to single en- conditions were screened before arriving at op- 

antiomer synthesis in the pharmaceutical in- timized conditions using cationic rhodium- 

dustry with considerable success. ChiroTech CR^Rl-MeDuPHOS (141), which provided the 

and Pfizer developed an improved synthesis of product with complete enantioselectivity and 

glutarate derivative (139), an intermediate re- avoided previously observed problems associ- 

quired for the synthesis of Candoxatril (140) ated with isomerization of the enone starting 

(156). The drug, a neutral endopeptidase inhib- material. The reaction could be conducted at a 
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NaBH 4 


NH 2 


HO' 



Lamiviidine (129) 


TMS-cytosine 



Figure 18.43. 


high substrate-to-catalyst ratio of 3500:1 with¬ 
out a detrimental effect on enantiomeric excess 
or reaction rate. In catalytic asymmetric reac¬ 
tions, it is clearly economically advantageous to 
minimize the amount of catalyst that may com¬ 
prise expensive chiral material and transition 
metals. 

A method for the asymmetric dihydroxyla- 
tion of alkenes to yield cis-diols was developed 
by the research group of Sharpless using 
chiral ligands derived from the cinchona alka¬ 
loids dihydroquinidine (DHQD) and dihydro¬ 
quinine (DHQ) with a catalytic amount of os¬ 
mium tetroxide (157). Although they are 
diastereomers, the phthalazine ligands act as 
“pseudo-enantiomeric” ligands, i.e., they give 
opposite asymmetric induction in a predict¬ 
able manner. This procedure was recently 
used to prepare both isomers of combretadi- 
oxolane (144), a chiral analog of the natural 
product Combretastatin A-4 (146) (158). Com- 


bretastatin A-4 displays antitubulin activity 
and cytotoxicity to tumor cells and is therefore 
an interesting lead structure for new antican¬ 
cer drugs. The asymmetric synthesis of ( S,S)~ 
combretadioxolane (144) involved treatment 
of the trans-stilbene (142) with "AD-mix-a" 
[containing (DHQ) 2 -PHAL] (145), whereas 
the enantiomer {R,R )-combretadioxolane re¬ 
sulted from use of AD-mix-& which contains 
(DHQD) 2 -PHAL as the chiral ligand. The tu¬ 
bulin polymerization-inhibitory activity of 
(S,S)-combretadioxolane was comparable with 
combretastatin A-4 (IC 50 = 4-6 iiM) in an in 
vitro assay, whereas CR,7?)-combretadioxolane 
was essentially inactive (IC 50 > 50 fiM), In ad¬ 
dition, (S,S)-combretadioxolane was 20 times 
more potent than vincristine as an in vitro 
growth inhibitor of the multidrug-resistant cell 
line PC-12. 

Workers at SmithKline Beecham reported 
the stereoselective synthesis of inhibitors of 
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Cl 

(+/-)-(! 33) 



Cl 


(+)-(134) (>95%ee) 



(1 R, 3S)-(132) (>98%ee) 

Figure 18.44. 

the cysteine protease cathepsin K (159). Apro- 
cedure was sought to allow preparation of ei¬ 
ther enantiomer of azido alcohol (148). This 
was readily achieved by Jacobsen asymmetric 
desymmetrization of the meso-epoxide (147) 
using azidotrimethylsilane catalyzed by chro¬ 
mium salen complex (149) (160). Use of the 
tR,i2)-salen catalyst shown generated (3S, 
4R)-(148), whereas the (S^SO-catalyst pro¬ 
vided the (3R ,4<Sj )-azido silyl alcohol, both with 
very high enantioselectivity. After removal of 
the silyl group and reduction of the azido moi¬ 
ety, the resultant enantiomeric amino alco¬ 
hols were transformed into diastereomers 
(4S0- and (4J?)-(150) by reaction with leucine, 
amide formation, and oxidation. The cathep¬ 
sin K activity for the diastereomers showed 
the (4s)-isomersto be up to 40-fold more po¬ 
tent than the corresponding (4/2)-(150) in an 
enzyme assay. 

A large scale synthesis of the drug Nelfi- 
navir, an HIV protease inhibitor developed 
by Agouron (now Pfizer) was reported with 
the amino alcohol derived from (148), pre¬ 
pared using the Jacobsen procedure described 
above (161). 

A similar approach uses the chromium- 
Salen complex (149) to open racemic terminal 
epoxides in a highly efficient resolution pro- 


Figure 18.45. 
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3.136(10mol%) 
2. MeOH 


Ph 



1. /-PrOH/CH 2 CI 2 

2. H 3 B-SMe 2 


Figure 18.46. 


OH 


MeS02H N 




MK 499 (137) (98%de; 92% yield) 


cess that has been applied to the synthesis of 
biologically active compounds (162). As with 
any such resolution process, the maximum 
yield of enantiopure material is 50%based on 
starting material. Terminal epoxides are easy 


to prepare in racemic form, and conversely, 
difficult to prepare as single enantiomers by 
epoxidation of the corresponding alkene. (R)~ 
9-[2-(phosphonomethoxy)propyl]adenine (R- 
PMPA) is a nucleotide reverse transcriptase 



(138) 


[((/?,fl)-Me-DuPHOS)Rb(COD)]BF 4 

H 2 (5 atm)/MeOH 

(COD = 1,5-cyclooctadiene) 



(139) (>99%ee, 95% yield) 



Figure 18.47. 
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(S,S)-( 143) (>99%ee; 89% yield) 




.OMe 



(DHQ) 2 -PHAL (145) 


OH 


(S,S)-Combretadioxolane (144) 



OMe 


Combretastatin A-4 (146) 


Figure 18.48. 


inhibitor being developed by Gilead Sciences 
and a collaborative group from the University 
of Washington for the treatment and pre¬ 
vention of HIV infection (163). The com¬ 
pound can be prepared through kinetic res¬ 
olution of propylene oxide using (jS,S)-(149) 
and the resultant (R)-l-amino-2-propanol 
(153)was transformed into (i?)-PMPA (154) 
in five steps (162). 

In 1997, Tokunaga et al. reported the hy¬ 
drolytic kinetic resolution of racemic termi¬ 
nal epoxides using a Co(III)-Salen catalyst 
(164). This remarkably general process uses 
only water as the nucleophile and provides 
the synthetically useful chiral epoxides and 
diols in highly enantioenriched form. The 
catalyst can be recycled and the reactions 
conducted under solvent-free conditions. 


The process has been used by academic and 
industrial groups and is operated by Rhodia 
ChiRex on a large scale (165). 

A wide variety of synthetic processes have 
been rendered asymmetric through the use of 
a chiral catalyst. In addition to the types of 
reaction described above, chiral transition 
metal catalysts have been used to influence 
the stereochemical course of isomerization, 
cyclization, and coupling reactions. As an ex¬ 
ample, an approach towards the natural prod¬ 
uct (-)-epibatidine (158) was recently re¬ 
ported by Namyslo and Kaufmann (166). 
Epibatidine is a potent analgesic and a nico¬ 
tinic receptor agonist. The synthesis involves 
an asymmetric Heck-type hydroarylation be¬ 
tween the bicyclic alkene (155) and pyridyl 
iodide (156). A number of bidentate chiral li- 



820 


Chirality and Biological Activity 



Figure 18.49. (4S)-(150) 


gands were investigated with BINAP (159), 7 CONCLUSIONS 

which were observed to give the highest enan- 

tioselectivity. By using the (R)- or (S)-BINAP The ultimate focus of the endeavors of medic- 

ligand, both enantiomers of (157) were acces- inal chemists is to develop a successful drug 

sible with about the same level of enantio- that will cure patients. However, with the in¬ 
selection. creased regulatory requirements within the 

The continuing development of efficient competitive biotechnology and pharmaceuti- 
and practical asymmetric processes will be one cal industry, the initial research to achieve 

of the major driving forces in the future of this objective must be conducted in a rapid and 

drug discovery and development. In particu- thorough manner. During the drug research 
lar, the design of new general and practical and development process, the important and 

catalytic processes will help explore the link subtle relationship between chirality and bio- 

between chirality and biological activity. logical activity should be carefully considered. 



TMSN 3 (0.5 equiv.) 
(S,S)-6.51 (1 mol%) 





(152) (97%ee; 

98% yield based on TMSN 3 ) 


(153) (84%yield) 




Figure 18,50. 
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I 

(156) 




(157) (81%ee; 53% yield) 



(R )-BINAP (159) Epibatidine (158) 

Figure 18.51. 


Enantiomers frequently display markedly 
different biological activity; however, the fact 
that a large and adaptable toolbox of chemical 
and biological techniques to obtain single iso¬ 
mers are available allows the medicinal chem¬ 
ist to avoid working with mixtures of stereo¬ 
isomers. 

As reviewed in this chapter, there are nu¬ 
merous synthetic strategies available to the 
medicinal chemist that offer their own partic¬ 
ular drawbacks and advantages. In the early 
stages of research it may be preferable to sep¬ 
arate isomers by chromatography, thus pro¬ 
viding both single enantiomers for biological 
testing. It should be noted that all the tech¬ 
niques described in this chapter can be used in 
conjunction with one another. That is to say, if 
one technique such as asymmetric synthesis 
failed to deliver enantiopure material, then 
another technique such as crystallization can 
be used to push through the product to the 
desired purity. As an example of this "double" 
approach, the use of SMB and crystallization 
in the separation of mandelic acid is worthy of 
note (56). The use of asymmetric hydrogena¬ 
tion followed by asymmetric enzymic transfor¬ 
mation to obtain single isomer products has 
also been described by Taylor et al. at Chiro- 
Tech (167). 

In conclusion, if a chiral center is present in 
a molecule designed and synthesized by a me¬ 
dicinal chemist, there are a broad number of 


methods available to prepare or isolate either 
isomer. From the examples given in this chap¬ 
ter, stereoisomers frequently display mark¬ 
edly different biological properties where the 
desirable properties associated with one iso¬ 
mer may not be apparent when the corre¬ 
sponding racemic mixture is tested either in 
vivo or in vitro. 
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1 INTRODUCTION 

The increased acceptance and availability of 
various structure-activity relationship (SAR) 
approaches in health hazard identification (1, 
2) is accompanied by many opportunities and 
some pitfalls. The latter are derived from the 
availability of various computer-based SAR 
platforms whose basis and performance char¬ 
acteristics are not transparent to the user. 
Such programs, in the hand of the non-expert, 
may be misused. SAR models and associated 
technologies on the other hand, while not crys¬ 
tal balls, provide the expert toxicologist with 
meaningful information regarding the puta¬ 
tive toxicological profile of candidate agents. 
They can guide in the design of agents with 
decreased or without unwanted side effects 
and yet retain or even enhance therapeutic 
effectiveness. Finally, the SAR technology can 
provide insights into the mechanism whereby 
a chemical exerts its toxic effects and thereby 
provide a better understandingof the risk that 
the agent poses to humans. 

However, to achieve these aims, it is essen¬ 
tial that the performance characteristics and 
the basis of the SAR model be known. This 
involves several critical steps in the SAR 
model development. These are listed below, 
each of which will be amplified: 

1. Development of database 

2. Model building 

3. Model characterization 

4. Model validation 

5. Model application to individual agents 
and for mechanistic evaluations 

Although the present review focuses on the 
MULTICASE SAR methodology (3-5), the 
concepts discussed herein apply to all gener¬ 
ally available SAR techniques used to study 
toxicological phenomena. Basically SAR ap¬ 
proaches that have been used fall into two cat¬ 
egories: those that are based on statistical au¬ 
tomated algorithms not dependent on prior 
expert judgment and those that are a priori 


Table 19.1 Some SAR Approaches Used 
in Toxicology 


Designation 

Approach' 1 

References 

MULTICASE 

I 

3-5 

TOPKAT 

I 

6 

COMPACT 

I 

7 

DEREK 

II 

8-10 

ONCOLOGIC 

II 

11, 12 

Hazard Expert 

II 

13 

PROGOL 

I 

14 

Structural Alerts 

II 

15 


"Approach I indicates statistical automated algorithms 
not dependent on prior expert judgment. Approach II indi¬ 
cates a rule-based technique that requires prior expert 
judgment. 

rule-based requiring prior expert knowledge 
(Table 19.1). However, as will be stressed 
herein, even the approach not requiring prior 
expert input is very much dependent on hu¬ 
man expertise at various stages of the model 
development and interpretation process. 

Reviews and assessments of the various 
SAR methodologies used to analyze toxicolog¬ 
ical phenomena are available (16-21). 

1.1 Development of Database 

»■ , 

Most experimental data compilations of toxi¬ 
cological effects both in the public domain as 
well as in proprietary databases were not de¬ 
veloped for SAR purposes. Thus, with respect 
to some toxicological phenomena, the data¬ 
base may be rich with certain chemical classes 
such as chloroarene and lacking in data relat¬ 
ing to others, e.g., aminoarenes. Yet, unlike 
the SAR models developed for drug discovery, 
SAR models of toxicological phenomena must 
be able to handle databases composed of non- 
congeneric chemicals. Additionally, as a con¬ 
sequence of how toxicological data are gener¬ 
ated, there may be a paucity of data 
altogether. Yet, for optimal SAR models of tox¬ 
icological phenomena, the "learning set" 
should include at least 300 non-congeneric 
chemicals (3, 22). 

Accordingly, the human expert may sug¬ 
gest, that for certain purposes, the results of 
certain assays be pooled, e.g., rat and mouse 
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carcinogenicity or the results of the Salmo- 
nella and E. coli WP uvrB mutagenicity assays 
(23, 24). Obviously, such data pooling must be 
based on a sound scientific basis as well as data 
that show extensive concordance between the 
experimental results of the systems to be 
pooled, i.e., that a substantial number cf 
chemicals must give identical results in the 
two systems, thereby indicating that results 
obtained with one system can be amalgamated 
with those obtained in the other (25). For ex¬ 
ample, when the same chemicals were tested 
for their ability to induce sister chromated ex¬ 
changes and chromosomal aberrations in cul¬ 
tured Chinese hamster ovary (CHO) cells, 
they showed divergent results (26). Hence, the 
results of the two assays cannot be amalgam¬ 
ated into a single database to develop an SAR 
model of cytogenetic effects. Similarly, even 
using the same indicator system, results can¬ 
not be merged if different criteria are used to 
interpret the significance of the results. That 
situation prevails with respect to the induc¬ 
tion of mutations at the thymidine kinase lo¬ 
cus of mouse lymphoma cells vis-a-vis the cri¬ 
teria used by the U.S. National Toxicology 
Program versus those employed by the U.S. 
Environmental Protection Agency's GeneTox 
Program. In fact, each data set gives rise to a 
distinct SAR model (27-29). 

On the other hand, the consensus database 
of potential developmental toxicity in humans, 
based on experimental results in animals, ob¬ 
servations in exposed humans, and expert 
judgment, yields a coherent SAR model of de¬ 
velopmental risks to humans (30).That model 
is distinct from SAR models of developmental 
toxicity to individual rodent species (31). 

1.2 Model Building 

Once a "learning set" (i.e., database) satisfy¬ 
ing preset criteria for acceptance (3)has been 
developed, the model building phase can be¬ 
gin. In general, this is a straightforward pro¬ 
cess that is specific for the SAR method em¬ 
ployed. 

Here, I will exemplify the various stages 
with the MULTICASE SAR system (3-5, 32). 
Thus, once the structures of the chemicals and 
an indication of their potency (i.e., either ac¬ 


tive, marginally active, and inactive, or a con¬ 
tinuous scale of potencies) are entered, the 
program identifies the chemical substructures 
significantly associated with the toxicological 
phenomenon under investigation (Table 

19.2) . Each of these structural determinants 
("toxicophore") is associated with a base po¬ 
tency and a probability of activity (see Fig. 
19.1). The latter is derived from the distribu¬ 
tion of active and inactive molecules that con¬ 
tain the toxicophore. The program also identi¬ 
fies the chemicals that give rise to the 
toxicophore (Table 19.3 and Fig. 19.7). This 
enables the human expert (see below) to ascer¬ 
tain whether the structures of the chemicals 
giving rise to the toxicophores are germane to 
the test chemical whose toxicity is predicted. 

In addition to the toxicophores, the pro¬ 
gram also identifies modulators for specific 
toxicophores (Table 19.4). These are substruc¬ 
tures or physicochemical parameters that de¬ 
termine whether the specific toxicity inherent 
in the toxicophore will be expressed or 
whether it is augmented further. 

Thus, when faced with a chemical of un¬ 
known activity, the program uses the presence 
or absence of toxicophores and of modulators 
to predict its toxicity (Figs. 19.1-19.3). Thus, 
the presence of the toxicophore OH—C= (a 
phenol) endows a chemical with an 87.5% 
probability of being a contact allergen and a 
potency of 51 (moderate activity, see Table 

19.3) . That basal activity is modulated by 
-25.8 X electronegativity (see Table 19.3). 
For the example in Fig. 19.1, this results in a 
further increase in potency. The total potency 
of 55 units corresponds to a moderately strong 
activity (Table 19.3). A chemical with that 
toxicophore may also contain a structural 
modulator that augments the basal activity 
further (Fig. 19.2). On the other hand, the 
chemical may contain a modulator which com¬ 
pletely abolishes a chemical's potential to be 
an allergen (Fig. 19.3). Additionally, the MUL¬ 
TICASE SAR program will identify substruc¬ 
tures that are absent from the learning set and 
therefore may introduce an element of uncer¬ 
tainty in the prediction, i.e., the "unknown" 
substructure could represent either potential 
toxicophore or a modulator that alters a rec- 
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Table 19.2 Major Toxicophores Associated with Allergic Contact Dermatitis in Humans 


Fragment 






N* 


Inactives* 

Marginals* 

Actives* 

Toxicophore 

No. 

1 

2 

3 

4 

5 

6 

7 

9 

10 





N 

--ch 2 — 






50 


1 

0 

49 

1 

N 

—ch 3 — 






26 


1 

0 

25 

2 

Cl 

—ch 2 — 






8 


0 

0 

8 

3 

OH 

—c= 






86 


10 

0 

76 

4 

SH 

—CH 2 — 






19 


0 

1 

18 

5 

NH 

-ch 2 — 






26 


1 

1 

24 

6 

nh 2 

-ch 2 — 






7 


0 

0 

7 

7 

cH 

=cH 

—c 

=cH— 



(3—NH 2 ) 

18 


0 

0 

18 

8 

CH" 

—CO 

—CH= 





15 


0 

0 

15 

9 

0 

—CO 

—c 

=ch 2 




21 


1 

5 

15 

10 

NH 

—c 

=cH— 





17 


0 

1 

16 

11 

CO 

—CH 

=CH- 





9 


0 

0 

9 

12 

N 

=C— 






24 


1 

0 

23 

13 

CO 

—N— 






14 


0 

0 

14 

14 

OH 

—CO 

—c= 





8 


1 

0 

7 

15 

cH 

=c 

—c 

=c 

—cH= 



25 


1 

0 

24 

16 

cH 

=cH 

—cH 

=cH 

—cH 

=c)— 


17 


4 

0 

13 

17 

S 

—c.= 






10 


0 

0 

10 

18 

CO 

-ch 2 

~ch 2 

-c= 




5 


5 

0 

0 

19 


The database and derivation of the SAK model have been described (33). 

*N indicates the number of chemicals in the database that contain that toxicophore. “Inactives,” "marginals," and "actives" indicate the distribution of that toxicophore among 
activity groups. 

Toxicophore No. 4 is shown embedded in chemicals in Figs. 19.1-19.3 and No. 5 is shown in Fig. 19.5. 

C indicates a carbon atom shared by two rings; (3—NH 2 ) indicates an amino group attached to the third non-hydrogen atom from the left. In toxicophore No. 17, the last carbon 
to the right is shown as unsubstituted. This means that it can be substituted with any atom except a hydrogen. On the other hand, in toxicophore No. 8, the penultimate carbon is 
shown unsubstituted; it can only be substituted by an amino group (i.e., (3—NH 2 ). However, the last carbon of that toxicophore is shown with an attached hydrogen. It cannot be 
substituted by any other atom. 
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Table 193 Derivation of Toxicophore: 
The 19 Molecules Containing Fragment 
SH—CH 2 


Chemicals 

Potency" 

2,3-Dimercapto-l-propanol 

55 

2-Mercaptoethanesulfonic acid 

55 

2-Mercaptoethyl methyl sulfone 

45 

2-Mercaptoethyl urea 

35 

2-Methoxyethyl mercaptoacetate 

35 

N-{1,1 -dimethy lolethy 1) 


mercaptoacetamide 

35 

A/W-dimethyl mercaptoacetamide 

35 

JV-(2-mercaptoethyl) acetamide 

25 

A/-(2-mercaptoethyl) pyrolidone 

55 

iV-(mercaptoacetyl) urea 

35 

N -{mercaptoacety 1) glycine 

35 

A-(mercaptoethyl) morpholine 

35 

N-methyl mercaptoacetamide 

35 

N-trimethy lolmethyl 


mercaptoacetamide 

45 

Cysteine 

45 

Mercaptoacetamide 

55 

Mercaptoacethydrazide 

45 

Mercaptoacetic acid 

45 

Thioglycerol 

55 


The program identifies the chemicals that are responsi¬ 
ble for toxicophore No. 5 of Table 19.1 (see also Fig. 19.7). 
The toxicophore is shown embedded in a molecule in Fig. 
19.5. 

"The allergenic potencies were defined based on the per¬ 
cent responders in the human maximization test as follows 
(33): 10, Non-sensitizer; 25, "marginal" (4-7%responders); 
39, “weak” (8-23%responders); 49, "moderate": (24-55% 
responders); 59, "strong" (56-83% responders); 69, "ex¬ 
treme" (84-100%responders). 

ognized toxicophore or a noninformative 
structure unrelated to toxicity (Fig. 19.4). 

It should be stressed that not every experi¬ 
mental data set gives rise to a coherent SAR 
model. Failure to construct a model may be 
caused by the fact that the experimental data 
are invalid or that they do not reflect a specific 
toxicological phenomenon. Additionally, the 
phenomenon under investigation may be so 
complex or be the result of so many different 
mechanisms that the experimental database is 
not sufficiently large to describe it. With this 
in mind, it should be stressed that the predic- 
tivity of the SAR model will be a reflection of 
the complexity of the phenomenon, the size of 
the database (i.e., the number of chemicals for 
which experimental data are available), and the 
ratio of actives/inactives in the dataset (3, 22). 


In view of the above considerations, once 
an SAR model has been developed, it requires 
extensive validation and characterization. 

1.3 Model Characterization 

As mentioned above, the nature of the SAR 
model that is derived is a reflection of the com¬ 
plexity of the toxicological phenomenon that it 
describes, as well as of the size of the learning 
and the extent to which it includes chemical 
classes and/or substructures that are repre¬ 
sentative of the chemical species to which it 
will be applied. Thus, the chemical substruc¬ 
tures present among therapeutics are much 
greater and diverse than, for example, those 
used or generated in the chemical or agricul¬ 
tural industries. This means that SAR models 
used to examine pharmacologically active sub¬ 
stance must contain a greater variety of chem¬ 
ical substructures. This may well translate 
into a requirement for a larger experimental 
data set (i.e., one containing an increased 
number of chemicals). 

In evaluating the SAR model, it is of impor¬ 
tance to determine the relationship between 
its predictivity and the size of the database to 
determine whether the model is ovtimal. This 
can be ascertained by first determining the 
model's predictivity (see below),and then sys¬ 
tematically decreasing the size of the database 
by random deletion of chemicals to determine 
the predictive parameters of the model de¬ 
rived from the reduced data set. Doing this 
iteratively will allow a determination of the 
relationship between database size and con¬ 
cordance between predicted and experimen¬ 
tally derived results (22). If the relationship, 
including the value for the SAR model derived 
from the total database is linear, then the 
model will not be optimally predictive and con¬ 
sideration should be given to obtaining addi¬ 
tional experimental data and deriving a fur¬ 
ther model. On the other hand, if the 
relationship including the data for the SAR 
model derived from the total database is no 
longer linear, the size of the data set may be 
satisfactory. Incremental data may not yield a 
correspondingly significant increase in the 
model's performance. Thus, the predictivity of 
the SAR model of mutagenicity in Salmonella 
improves linearity until a database size of 350 
chemicals is reached, and then it plateaus (22). 
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Table 19.4 List of MODULATORS Related to Toxicophore OH—c = 


Fragment 









Constant = 51.0 
QSAR 

Toxicophore 

No. 

1 

2 

3 

4 

5 

6 

7 

8 




2D 

[N—](- 

-6.5A - 

-) [NH—] 






16.7 

1 

CO 

—CH* 

-ch 2 

— 






-53.3 

2 

cH 

=c 

)—c 

=cH— 





<3—CO) 

-38.8 

3 

OH 

— c 

==c 

—CO 

—c= 





25.2 

4 

cH 

=c 

—cH 

=cH 

—c= 




(2—OH) 

8.3 

5 

OH 2 

—CH 

—c 

=cH 

—cH 

=cH— 



(3—cH—) 

13.4 

6 

cH 

=cH 

—cH 

=0 

—cH 

=cH— 



(4—OH) 

-10.3 

7 

cH 

=cH 

—cH 

=cH 

—c 

=cH— 



(5—OH) 

-10.3 

8 

OH 


=c 

—0 

—CO 

—ch 2 

—CH— 



-20.3 

9 

OH 

—c 

=cH 

—cH 

=cH 

—cH 

=cH— 



-20.6 

10 

OH 

—c 

==c 

—cH 

=c 

-ch 2 

-ch 2 

-ch 3 


-22.4 

11 

(HOMO + LUMO)/2 








-25.8 

12 


Modulators associated with toxicophore No. 4 of Table 19.2. Each of the modulators augments or decreases the activity inherent in the toxicophore (i.e., 51.0 units; see Figs. 

19.1 -19.3). (HOMO + LUMO)/2 describes the electronegativity of the molecule. That value is multiplied by -25.8. Modulator No. 5 and No. 2 are shown embedded in chemicals in Figs. 

19.2 and 19.3, respectively. Modulator No. 1 describes a 2D distance descriptor of 6.5 A between two atoms. For interpretation of the structures see legend to Table 19.2. 
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The molecule contains the Toxicophore (nr.occ.* 2) s 

OH -C" 

*** 76 out of the Known 86 molecules ( 88%) containing such a 
Toxicophore are Contact Allergen8 with an average activity of 49. 

(conf.level»100%) 

*** QSAR Contribution : Constant is 51.04 

** The following Modulator is also preeent: 

Electronegativity = -0.15 / Its contribution is 3.83 

*+ Total projected QSAR. activity 54.87 

*** The probability that this molecule is a Contact Allergen is 
87.5% ** 

** The projected Allergic Potency is 54-9 CASE units ** 



Figure 19.1. Prediction of the contact allergenicity of 2-methyl-l,4-benzenediol.The prediction is 
based on the presence of the toxicophore (shown in bold). The potency is modulated further by the 
electronegativity (see Table 19.4). A potency of 55 units indicates a moderately strong allergen (see 
Table 19.3). 


Another concern relates to the effect of the 
ratio of active to inactive chemicals in the data 
set. Some SAR models are most predictive 
when that ratio is unity (3, 22). Hence, for a 
model that will be widely used for hazard iden¬ 
tification and risk assessment purposes, it 
would be of importance to determine whether 
its performance is optimal. Thus, if the num¬ 
ber of inactives exceeds the number of actives, 
the number of inactives can be decreased by 
randomly removing the appropriate number 
of inactives and determining the performance 
of the resulting SAR model. The random dele¬ 
tion of inactives and the model derivation 
should be repeated several times to ascertain 
that a robust model has been derived. We 
found that because the nature of the toxico- 
phores is determined primarily by the actives 
and because the "quality" of the toxicophores 


is a function of the size of the database (22,34, 
35), it follows that if the number of actives 
exceeds the number of inactives that removal 
of actives to achieve a ratio of unity is not the 
optimal solution. Rather, we have found that 
supplementing the database with randomly 
selected chemicals from a "pool" of normal 
physiological chemicals (amino acids, sugar, 
lipids, purines, pyrimidines, etc., but exclud¬ 
ing hormones, prostaglandins, and vitamins), 
assuming these chemicals to be inactive, is a 
viable alternative (36,37). This is based on the 
recognition that the biological and/or toxico¬ 
logical phenomena being modeled occur in a 
milieu that is rich in these physiological chem¬ 
icals. 

Finally, the "informational content" of an 
SAR model determines its coverage. Thus, if a 
test molecule contains a substructure un- 
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The molecule contains the Toxieophore (nr.occ-= 2): 

OH -c" 

+ 76 out of the known 96 molecules ( 88%) containing such a 

Toxicophore are contact allergiee with an average activity of 49. 

(conf.Ievel*l00%) 

*** QSAR Contribution : Constant is 51.04 

** The following Modulators are also present: 

(1) CH =C -cH ■cH -c = c2-OH > Activating 8.33 

Electronegativity a -0.17 ; Its contribution is 4.29 

** Total projected QSAR activity 63.65 

*** The probability that this molecule is a Contact Allergen is 
B7.5% ** 

** The projected Allergic Potency is 63.7 CASE units ** 




Figure 19.2. Prediction of the contact allergenicity of 4-chloro-l,3-benzenediol. In addition to the 
probability cf activity and the basal potency derived from the toxicophore (shown in bold in A) , the 
chemical also contains an activating modulator (shown in bold in B) f which further augments the 
potency. A potency of 64 units indicates a very strong potency (Table 19.3). 


known to the model, this introduces a measure 
of uncertainty into the SAR prediction. In the 
MULTICASE SAR program, such an "un¬ 
known" moiety is flagged (Fig. 19.4). We have 
found that a satisfactory approach to deter¬ 
mining informational content is to challenge 
an SAR model with a panel of 10,000 chemi¬ 
cals representative of the "universe of chemi¬ 
cals" and determining the frequency with 
which the SAR predictions are accompanied 
by a " warning" of the presence of "unknown" 
substructures. An enumeration of the fre¬ 
quency with which the individual unknown 


moieties are present will allow a determina¬ 
tion of their importance and thereby identifies 
chemicals that should be tested and the re¬ 
sults included in the model to improve the pre¬ 
dictive performance. This is based on the ob¬ 
servation that the greater the informational 
content (i.e., the fewer warnings of "un¬ 
known" moieties), the greater the model's pre- 
dictivity (22, 34, 35). 

1.4 Model Validation 

In its application to toxicology, SAR can serve 
two functions: (l)to predict a specific toxico- 
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The molecule contains the Toxicophore (nr.occ.a 1): 

OH - c H 

*** 76 out of the known 86 molecules ( 86%) containing such a 
Toxicophore are Contact Allergen8 with an average activity of 49. 

(conf.level=100%) 

*** QSAR Contribution : Constant is 51-04 

** The following Modulator8 are also present: 

( 1) CO -CH2-CH2- Inactivating -53.33 

Electronegativity ■ -0.10 ; Its contribution is 2.51 
** Total projected QSAR activity 0.22 

** The molecule contains the following DEACTIVATING Fragment: 

CO -CH2 

\ 

CH2 -c" 

The probability that this molecule is a Contact Allergen is 63.6% ** 



Figure 19.3. Prediction of the lack of contact allergenicity of zingerone. Whereas the presence of the 
toxicophore (A) is associated with a probability of activity and a potency, the presence of the inacti¬ 
vating modulator (B) abolishes the potency. Moreover, the presence of a deactivating moiety (C), 
which is present in five chemicals in the database that are devoid of allergenicity (Table 19.2,No. 19), 
further decreases the likelihood that the zigerone is a contact allergen. 


logical effect based on the identification of 
substructures significantly associated with 
that activity and (2) to gain insight into the 
mechanistic basis of that effect. 

To be useful in its predictive mode, the per¬ 
formance of a model does not need to be per¬ 
fect, but it must be known. The predictivity of 
an SAR model is defined by the concordance 
between the predictions of chemicals external 
to the SAR model and the experimentally de¬ 
termined toxicities. The predictivity is gov¬ 
erned by the sensitivity (number of correct 
positive predictions/total number of positive 


chemicals) and the specificity (number of cor¬ 
rect negative predictionsltotal number of neg¬ 
ative chemicals) (22). Moreover, because the 
basic function of SAR applied to toxicological 
phenomena is the prevention, reduction, or 
elimination of harmful chemicals from the 
home, the environment, and the workplace, 
risk averse prediction models are preferred. 
That is achieved by the development of SAR 
models that yield a low frequency of false neg¬ 
ative predictions, i.e., high specificity. Obvi¬ 
ously, ideally the model should have high sen¬ 
sitivity as well as high specificity (38). 
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Structural Concepts in the Prediction of the Toxicity of Therapeutical Agents 


0*** WARNING *** The following functionalities are UNKNOWN to tne; 
*** O -C. =C. - 


** The molecule does not contain any known Biophore 
it is therefore presumed to be INACTIVE 



Figure 19.4. Prediction cf the lack of contact allergenicity of of dehydroalantolactone. The chemical 
contains no toxicophore; therefore, it is presumed to be inactive. However, it contains two structures 
(shownin bold) that are "unknown" to the model. That introduces an element cf uncertainity in the 
prediction. 


The simplest way to determine predictivity 
parameters is to remove initially from the data 
set a random representative sample (e.g., 5%) 
to be used as a "tester set," to develop the SAR 
model on the remaining chemicals (i.e., 95%), 
and then challenge the model with the "tester 
set" and ascertain the predictivity. However, 
as has been demonstrated on a number of oc¬ 
casions, the predictivity of an SAR model is 
determined by the size of the database (22), 
and as in most instances, the size of the avail¬ 
able data set is not optimal, therefore, further 
decreasing the size of the learning set by se¬ 
questering the "tester set" is not optimal. 

To overcome this limitation, a cross-valida¬ 
tion approach has been used (39). In that pro¬ 
cedure, a portion of the database (e.g., 5%) is 
randomly selected and removed, and a model 
is developed from the remaining 95%. That 
model is challenged with the "tester set" (5%). 
That procedure is repeated 20 times, and the 
cumulative predictivity is determined. The fi¬ 
nal SAR model includes the complete database 
(i.e., 100%). Because the predictive perfor¬ 
mance is a function of the size of the database, 
the performance of the final model will be bet¬ 
ter than that based on 95% of the data. When, 


however, the learning set consists of less than 
150 chemicals, a more tedious procedure may 
be required, wherein one to two chemicals 
(i.e., n-1 or n-2) are removed at a time to serve 
as the "tester set" and the process is repeated 
n or n/2 times. 

1.5 Applications and Mechanistic Studies 

As has been mentioned earlier (Table 19.1), 
SAR methodologies can be divided into two 
general non-mutually exclusive approaches: 
(1) hypothesis driven and (2) knowledge 
based. The former is rule driven, wherein spe¬ 
cific properties or chemical substructures are 
looked for, e.g., mutagens are electrophiles 
and hence one would look for electrophilic or 
proelectrophilic moieties. This approach as¬ 
sumes that mutations are caused solely by co¬ 
valent binding of electrophiles to DNA. Agents 
that induce mutations by a nonelectrophilic 
(i.e., non-DNA damaging) mechanism will not 
be detected. Thus, agents that mutagenize 
purely as a result of intercalation between 
DNA base pairs (e.g., acridine orange, 
ethidium bromide) will not be identified. Such 
rules are based on prior knowledge and/or in- 
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The molecule contains the Bicphore 

SH -CH2 

*** 18 out of Che known 19 molecule8 ( 95%) containing such a 
toxicophore are Contact Allergens with an average activity of 42. 
(conf.level=100%) 


*** QSAR Contribution ; Constant is 52.50 

** The following Modulators are also present: 

(2D) [CO -J e-- 5.2A --> [SH -] Inactivating -7.6 

Electronegativity = 0.10 ; Ite contribution is -0.67 


** Total projected QSAR activity 44-23 

*** The probability that this molecule is a Contact 


Allergen is 95.0% ** 

The projected Allergic Cont activity is 44.3 CASE units ** 



Figure 19.5. Prediction of the contact allergenicity of iV-acetyl-L-cysteine. The prediction is based 
on the presence of the toxicophore (shown in bold), which is present in 19 chemicals in the database 
(18 allergens and 1 marginal allergen; see Table 19.3). The arrow indicates the 5.2 A distance 
described by the inactivating modulator. 


tuition and do not necessarily require adher¬ 
ence to strict statistical criteria. 

The approach illustrated herein, exempli¬ 
fied by MULTICASE (3), is knowledge based. 
The input consists of the structures and toxi¬ 
cological activities of the chemicals in the 
learning set. The program then identifies 
structural descriptors (toxicophores) that are 
significantly associated with activity (see Ta¬ 
ble 19.3). The human expert participates in 
setting criteria for the inclusion of experimen¬ 


tal results in the database (3) as well as in 
examining the plausibility of the final model 
based on exact knowledge of the toxicological 
phenomenon under investigation. The human 
expert again also determines the acceptability 
of individual predictions (see below). 

Once an SAR model has been developed 
and validated, it can be applied in a number of 
fashions. SAR methodologies, such as MULTI¬ 
CASE (3-5), which document predictions (Ta¬ 
ble 19.2), are obviously preferable to those 
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The molecule contains the Toxicophore 
S -C 

\\ 

N 

t 

C" 

*** 5 out of the known 5 molecules (100%) containing such Biophore 
are Mouse carcinogens with an average activity of 62. 

(conf.level=97%) 

** This Biophore exists in a significantly different environment 

than in the data baee (i.e. 5.45); It may not be relevant 

*** QSAR Contribution : Constant is 64.00 

** Total projected QSAR activity 64.00 

*** The probability that this molecule is a Mouse carcinogen is 05.7% 

** The projected Mouse carcinogenic potency is 64.0 CASE units ** 

Figure 19.6. Prediction of the carcinogenicity in mice of epitholone A. The structure of epitholone 
A (toxicophore shown in bold) is given in Fig. 19.7. 


that operate like a "black box." The latter 
simply provides a likelihood that a test chem¬ 
ical is active or inactive. When, however, the 
SAR prediction is accompanied by documenta¬ 
tion of the basis of that forecast, the human 
expert can determine whether it is justified 
and whether it applies to the specific test 
chemical. 

Thus, the mucolytic agent IV-acetyl-L-cys- 
teine is predicted to have a potential to induce 
allergic contact dermatitis by virtue of the bio¬ 
phore SH—CH 2 (Fig. 19.5). Moreover, exami¬ 
nation of the chemicals that contribute to that 
toxicophore reveals that indeed they all have 
the substructure in an environment that is 
similar to the one found in N-acetyl-L-cysteine 
(Table 19.3). On the other hand, the tubulin 
polymerization perturber (and potential anti¬ 
neoplastic agent) epitholone A (Fig. 19.6) is 
predicted to be a mouse carcinogen by virtue 
of the toxicophore units shown in bold. That 
toxicophore is present in five molecules in the 
learning set. The presence of that toxicophore 


is associated with an 89% probability of carci¬ 
nogenicity and a potency of 63 units, which 
corresponds to a TD„ value of 0.039 mmol/kg 
per day (40). However, the program flags the 
toxicophore because its environment in epi¬ 
tholone A is significantly different from that of 
the molecules in the learning set (Fig. 19.6). In 
fact, examination of the structures of the mol¬ 
ecules that contribute to the biophore (Fig. 
19.7) indicates that indeed the molecules are 
quite different from that of epitholone A, and 
hence, the prediction of carcinogenicity can be 
disregarded (however, also see below). 

Moreover the molecules that contributed to 
this toxicophore (Fig. 19.7), even though they 
contain the S—C=N—C= moiety (Fig. 19.6), 
also contain functionalities (i.e., "structural 
alerts") that are associated with carcinogenic- 
ity/genotoxicity such as nitro, amino, and hy- 
drazino groups. In fact, these could be respon¬ 
sible for the murine carcinogenicity of these 
chemicals. Obviously, these latter functional¬ 
ities are absent in epitholone A. 
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H 

Epitholone A 

H H H 



Figure 19.7. Structures of epitholone A and of chemicals that contain the toxicophore. The toxico 
phore (Fig. 19.6) is shown in bold. 


Table 19.5 SAR Predictions Related to the Potential Carcinogenicity of Epitholone A 


SAR Model 

Prediction 

References 

Mutagenicity: Salmonella 

Negative 

22, 47 

Error-prone DNA repair 

Negative 

48 

Unscheduled DNA synthesis 

Negative 

49 

Mouse MTD 

Positive 

50 

Rat LD„ 

Positive 

SAR model based on RTECS 

Cell toxicity 

Positive 

51 

Inhibition GJIC 

Negative 

52 


A positive response indicates a potential for maximum tolerated dose of less than 0.9 mmol/kg; an LD 50 value of less than 
7.2 mmol/kg or a toxicity (IC 50 ) for cultured BALB/3T3 cells of less than 1 pM. 

GJIC, gap junctional intercellular communication; RTECS, Registry of Toxic Effects of Chemical Substances. 
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Table 19.6 Predicted Toxicological Profile of N-Acetylcysteine 


SAE Model 

Multicase 

Probability (%) 

Potency (units) 

Structure alerts 

0 

0 

Salmonella mutagenicity 

0 

0 

SOS chromotest 

0 

0 

umu/SOS repair 

0 

0 

Carcinogenicity: rodents-NTP 

0 

0 

Carcinogenicity: mice-NTP 

0 

0 

Carcinogenicity: rats-NTP 

0 

0 

Carcinogenicity: rodent-CPDB 

0 

0 

Carcinogenicity: mice-CPDB 

0 

0 

Carcinogenicity: rats-CPDB 

0 

0 

Inhibition gap junction intercell comm 

0 

0 

Binding to Ah receptor 

0 

0 

Mutations in mouse lymphoma (NTP) 

0 

0 

Mutations in mouse lymphoma (GenTox) 

0 

0 

Sister chromatic exchanges in vitro 

0 

0 

Chromosomal aberrations in vitro 

0 

0 

Unscheduled DNA synthesis in vitro 

0 

0 

Cell transformation 

0 

0 

Drosophila somatic mutations 

0 

0 

Sister chromatic exchanges in vivo 

0 

0 

Induction of micronuclei in vivo 

0 

0 

Yeast malsegregation 

0 

0 

Inhibition cf tubulin polymerization 

0 

0 

Sensory irritation 

89 

72 

Eye irritation 

72 

52 

Respiratory hypersensitivity 

0 

0 

Allergic contact dermatitis 

95 

44 

Rat lethality (LD50) 

0 

0 

Mouse MTD 

0 

0 

Rat MID 

0 

0 

Cellular toxicity (3T3) 

0 

0 

Cellular toxicity (HeLa) 

0 

0 

Nephrotoxicity: male rats (a2/aglobulin) 

0 

0 

Inhibition human cyt. P4502D 

0 

0 

Developmental toxicity: hamster 

0 

0 

Developmental toxicity: human 

0 

0 

Aquatic toxicity (minnows) 

0 

0 

Water solubility: 3.88 

logP (Octanol: water): 

-1.79 

Electronegativity: 0.10 




NTP and CPDB refer to the U.S. National Toxicology Program carcinogenicity assays (45) and to the Carcinogenic 
Potency Data Bases (46), respectively. 


Based on all of these considerations, the 
"human expert" would overrule the prediction 
of rodent carcinogenicity. Additionally, in 
overriding the computer-based prediction, 
cognisance was also taken of the understand¬ 
ing that the vast majority of recognized hu¬ 
man carcinogens are genotoxicants, i.e., 
"genotoxic carcinogens" (41-44). Epitholone 


A, on the other hand, was not predicted to be 
genotoxic (i.e. f a DNA-damaging agent), evi¬ 
denced by its lack of potential to induce muta¬ 
tions in Salmonella, error-prone DNA repair, 
or unscheduled DNA synthesis in rat hepato- 
cytes (Table 19.5). Thus, even if the potential 
for murine carcinogenicity were accepted, in 
view of the fact that the vast majority of rec- 
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The molecule contains the Biophore (nr.occ.= 1): 

NH -CH 

\ 

C. " 

*** 38 out of the known 41 molecules (93%) containing such a Biophore 
are perturbers of Tubulin Polymerization 
*** QSAR Contribution : Constant is 85.89 


** Total projected QSAR activity 85.89 

** The probability that this molecule inhibits Tubulin 
Polymerization is 93% ** 

** The projected Tubulin Polymerization Inhibitory activity is 35-9 
CASE units ** 

Figure 19.8. Prediction cf the ability of colchicine to inhibit tubulin polymerization. The structure 
of colchicine is shown in Fig. 19.9. The biophore is shown in bold (a) in Fig. 19.9. 


ognized human carcinogens are mutagens/ 
genotoxicants or are hormones and epitholone 
A is neither, it would not represent a human 
risk. 

If, based on the above, it were accepted that 
epitholone A is not genotoxic, and if the hu¬ 
man expert examining the documentation 
wished not to override the prediction of carci- 



Figure 19.9. Structure of colchicine. The biophore 
A (bold, see Fig. 19.8) is responsible for the thera¬ 
peutic effectiveness. Toxicophore B (see Fig. 19.10; 
shown in bold) is responsible for the induction of 
sister chromatid exchanges (SCE) in vivo. Removal 
cf toxicophore B or its replacement be isopropoxy 
groups abolishes the induction of SCEs without af¬ 
fecting the therapeutic potential. 


nogenicityin mice based on the differences in 
chemical environments between epitholone A 
and the molecules responsible for the toxico¬ 
phore (Figs. 19.6 and 19.7), he could examine 
mechanisms of non-genotoxic carcinogenicity, 
even though its relevance to human may not 
be applicable. One of the mechanisms of non- 
genotoxic carcinogenicity is inhibition of in¬ 
tercellular communication (53). Epitholone A 
does not possess such a potential (Table 19.5). 
Another mechanism for non-genotoxic rodent 
carcinogenesis may involve systemic or cell 
toxicity followed by mitogenesis (54-56). This 
may occur as a consequence of including the 
maximum tolerated dose (MTD) in the cancer 
bioassay protocol. When this is done, up to 
50% of chemicals tested are found to be rodent 
carcinogens (54). Obviously, this MTD situa¬ 
tion rarely, if ever, applies to humans. Still, 
epitholone A has the potential for inducing 
cellular as well as systemic toxicity (Table 
19.5), which may explain its potential carcino¬ 
genicity in mice, were we to discount the dif¬ 
ference in chemical environment. 

Obviously, the availability of a number of 
characterized and validated SAR models al¬ 
lows the development of a putative toxicologi- 
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The molecule contains the toxicoophore (nr.occ.n 3); 

CH3 -0 

\ 

c' 1 

8 out of the known 0 molecules (100%) containing such a 
toxicophore are Mouse SCE inducers with an average activity of 
57. 


QSAR Contribution ; Constant is 73.17 

* * The following Modulators are also present: 

( 3) CH3-0 -c = Inactivating -7.41 

( 1) CH3-0 -c =cH " Inactivating -7.41 

Log partition coeff.= 3.19 ; LogP contribution is -7.65 

** Total projected QSAR activity 50.70 

** The probability that this molecule induces Mouse SCEs is 90.0% ** 

** The projected Mouse SCE inducing activity is 50.7 CASE units 

Figure 19.10. The potential cf colchicine to induce sister chromatic exchanges in vivo. The struc¬ 
ture of colchicine and of the toxicophore B is given in Fig. 19.9. One of the inactivating modulators (c) 
is also shown in bold in Fig. 19.9. 


cal profile (Table 19.6). This can be used as a 
guideline in the product developmental phase 
to select lead compounds least likely to induce 
unwanted side effects. However, the SAR ap¬ 
proach can also be used to optimize beneficial 
effects and decrease or eliminate unwanted 
toxic effects. 

Thus, let us examine colchicine (CH), an 
anti-inflammatory agent that has been in use 
for several centuries for the treatment of gout. 
The anti-inflammatory potential of CH is un¬ 
derstood to derive from its ability to inhibit 
tubulin polymerization (iTP) (57). That is also 
the basis of the anticancer activity of pacli- 
taxel (Taxol) (58-60). The structural basis of 
that activity derives from the presence in CH 
of the NH—CH—C.= moiety (Figs. 19.8 and 
19.9), which endows the molecule with a 93% 
probability of activity. However, colchicine 
also has the potential for inducing sister chro¬ 
matid exchanges (SCEs) in vivo (Fig. 19.10). 
This SCE-inducing ability may endow it with 
genotoxic and developmental toxicity poten¬ 


tials. However, the potential for inducing 
SCEs i n vivo is associated with the methoxy 
moiety (Figs. 19.9 and 19.10). Removal of that 
moiety or replacing it with an isopropoxy 
group abolishes the SCE-inducing ability of 
CH without affectingits potential for iTP (i.e., 
the basis of its anti-inflammatory action). 

Finally, SAR approaches can also be used to 
provide a basis for making intelligent risk as¬ 
sessments. Thus, it has been shown that the 
similarity in biophores/toxicophores present 
in different SAR models of toxicological phe¬ 
nomena provides a measure of mechanistic 
similarity (3). The SAR models of mutagenic¬ 
ity in Salmonella and of error-prone DNA re¬ 
pair (SOS Chromotest) show significant over¬ 
laps (Table 19.7). This is not unexpected 
because DNA is the target of both phenomena, 
and the tester strain used for the Salmonella 
mutagenicity assays contains a plasmid that 
codes for error-prone DNA repair (61). In fact 
there is a substantial (though not complete) 
overlap among chemicals that cause the two 
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Table 19.7 Structural Commonalities 
among SAR Models 


SAR Models 

Percent 

Salmonella mutagenicity and SOS 
chromotest 

57 

Salmonella mutagenicity and 

iGJIC 

10 

Salmonella mutagenicity and iTP 

9 

Salmonella mutagenicity and Mnt 

53 

Mnt and iTP 

71 


iGJIC, inhibition of gap functional intercellular com¬ 
munications; iTP, inhibition of tubulin polymerization; 
Mnt, induction of bone marrow micronuclei in vivo. 


phenomena (48, 62). On the other hand, there 
is little overlap between Salmonella mutage¬ 
nicity and inhibition of gap junctional inter¬ 
cellular communication (Table 19.7), which is 
considered the epigenetic (non-genotoxic) 
phenomenon par excellence (53). Nor do the 
SAR models for Salmonella mutagenicity and 
inhibition of tubulin polymerization overlap 
significantly (Table 19.7), which is further 
support for the fact that genotoxicity and in¬ 
hibition of tubulin polymerization can be dis¬ 
sociated (see above). 

With respect to the in vivo induction of mi¬ 
cronuclei (Mnt), a different situation prevails. 
There is considerable overlap between the 
toxicophores associated with Mnt and those 
with the induction of mutation is Salmonella 
(Table 19.7). This is not surprising, because 
the induction of Mnt is known to involve a 
genotoxic mechanism (63, 64). Indeed, when 
attempting to identify potential genotoxic car¬ 
cinogens, when a chemical is found to induce 
mutations in Salmonella, this result is fre¬ 
quently followed by a Mnt test to determine 


whether the chemical is genotoxic in vivo as 
well (43, 65) and thus represent a risk to hu¬ 
mans. 

However, it was found that there is also 
substantial overlap between Mnt and iTP, the 
latter being a non-genotoxic phenomenon (Ta¬ 
ble 19.7) (66). This finding suggests that Mnt 
can occur by genotoxic as well as non-geno¬ 
toxic mechanisms. Thus, a positive Mnt re¬ 
sponse by a chemical that does not induce mu¬ 
tations in Salmonella does not necessarily 
represent a carcinogenic risk to humans. 

Discodermolide (Fig. 19.11) is a promising 
antineoplastic agent, which like paclitaxel, in¬ 
hibits tubulin polymerization (67), but being 
considerably more water-soluble than pacli¬ 
taxel, discodermolide may present certain 
therapeutic advantages while also being effec¬ 
tive against paclitaxel-resistant cells (67).Nei¬ 
ther discodermolide nor paclitaxel are muta¬ 
genic in Salmonella (and in fact neither is 
predicted to be a rodent carcinogen). However, 
both of these agents have a potential (deter¬ 
mined by SAR) to induce Mnt in vivo. In fact, 
for paclitaxel that potential has been deter¬ 
mined experimentally. This has led to the sug¬ 
gestion that paclitaxel, because of its ability to 
induce Mnt, presented a carcinogenic risk 
(68). However, based on the above findings 
(Table 19.7), it can be assumed that the ability 
of discodermolide and of paclitaxel to induce 
Mnt is independent of genotoxicity, and in 
fact, derives from iTP. Thus, it does not rep¬ 
resent an unreasonable risk to humans who 
are treated with those antineoplastic agents. 
In fact, the biophores in discodermolide re¬ 
sponsible for the induction of Mnt and iTP are 
identical (Fig. 19.11). 



Discodermolide 


Figure 19.11. Structure of disco¬ 
dermolide. The circled biophoreis re¬ 
sponsible for the inhibition of tubu¬ 
lin polymerization. 
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2 CONCLUSIONS 

SAR methodologies, in their present state, 
coupled with human expertise, can be used to 
determine and to understand the potential 
toxicity of therapeutic agents. In fact, this ap¬ 
proach can be used to engineer molecules de¬ 
void of the moieties associated with these un¬ 
wanted side effects. It must be understood, 
however, that while SAR techniques can be 
used to accelerate the identification and devel¬ 
opment of safe therapeutic agents, it is to be 
used as an adjunct to experimental determina¬ 
tions. 
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1 INTRODUCTION 

Of the 520 new pharmaceuticals approved be¬ 
tween 1983 and 1994, 39% were derived from 
natural products, the proportion of antibacte¬ 
rials and anticancer agents of which was over 
60% (1). Between 1990 and 2000, a total of 41 
drugs derived from natural products were 
launched on the market by major pharmaceu¬ 
tical companies (Table 20.1), including azithro¬ 
mycin, orlistat, paclitaxel, sirolimus (rapamy- 
cin), Synercid, tacrolimus, and topotecan. In 
2000, one-half of the top-selling pharmaceuti¬ 
cals were derived from natural products, hav¬ 
ing combined sales of more than US $40 
billion. These included the biggest selling an¬ 
ticancer drug paclitaxel, the “statin” family of 
hypolipidemics, and the immunosuppressant 
cyclosporin. During 2001 we have seen the 
launch of caspofungin from Merck and galan- 
tamine from Johnson & Johnson, with rosuv- 
astatin, telithromycin, daptomycin, and ect- 
einascidin-743 due to follow in 2002. 

Despite the figures, the popularity of natu¬ 
ral products, particularly those from higher 
plants as leads for new pharmaceuticals, tends 
to fluctuate. At the time of writing, several of 
the world's biggest pharmaceutical companies 
have reined back their natural product drug 
discovery programs and have placed great 
faith in combinatorial chemistry, coupled to 
very high throughput screening. Time will tell 
whether this is a wise stratagem, or whether 
the unique features of compounds that are 
themselves derived from living organisms will 
once again see renewed acceptance. 

The abundance of plant and microbial sec¬ 
ondary metabolites and their value in medi¬ 
cine are undisputed, but one question that is 
only partly answered concerns the reasons for 
this abundance of complex chemical sub¬ 
stances. In the past, the production of what we 
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would now call "bioactive" substances was a 
mystery. A modem view is that these com¬ 
pounds have a role in protecting the otherwise 
defenseless, stationary plant from attack by 
mammals, insects, fungi, bacteria, and vi¬ 
ruses. Taking morphine as an example of a 
secondary metabolite whose value to the plant 
is not entirely obvious, 14 steps are required 
from available amino acids, including at least 
one step that is highly substrate specific (2). 
The presence of morphine in the tissues of Pa- 
paver somniferum must therefore confer a se- 
lectional advantage on the plant (3): genetic 
code is required for each,of the enzymes in¬ 
volved in the biosynthesis, valuable amino ac¬ 
ids are utilized in forming the enzymes, and a 
relatively scarce nutrient (nitrogen) is locked 
up in the compounds produced. If the mor¬ 
phine did not continue to have value for the 
plant, mutants would have arisen with the ad¬ 
vantage of not having a drain on their meta¬ 
bolic resources. 

We can only guess at the ecological func¬ 
tions of morphine. Perhaps a mammalian 
herbivore that consumed too many poppies 
would become drowsy and itself fall prey to a 
carnivore. It may be significant that the can- 
nabinoids, produced in greatest abundance 
in the nutritious growing tips of the plant, 
also induce mental effects that would com¬ 
promise a herbivore's ability to escape a 
predator. Whatever their natural protective 
functions, natural products are a rich source 
of biologically active compounds that have 
arisen as the result of natural selection, over 
perhaps 300 million years. The challenge to 
the medicinal chemist is to exploit this 
unique chemical diversity. The following ac¬ 
count illustrates how natural products have 
been used as what are called lead com¬ 
pounds, or templates for the development of 
important medicines. 
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Table 20.1 Drugs Derived from Natural Products (1990-2000) 


Name 

Originator 

Indication/Use 

Acarbose 

Bayer 

Diabetes 

Artemisinin 

Kunming & Guilin 

Malaria 

Azithromycin 

Pliva 

Antibiotic 

Carbenin 

Sankyo 

Antibiotic 

Cefetamet pivoxil 

Takeda 

Antibiotic 

Cefozopran 

Takeda 

Antibiotic 

Cefpimizole 

Ajinomoto 

Antibiotic 

Cefsulodin 

Takeda 

Antibiotic 

Clarithromycin 

Taisho 

Antibiotic 

Colforsin daropate 

Nippon Kayaku 

Asthma 

Docetaxel 

Aventis 

Cancer 

Dronabinol 

Solvay 

Alzheimer's disease 

Galantamine 

Intelligen 

Alzheimer's disease, arthritis 

Gusperimus 

Nippon Kayaku 

Arthritis 

Irinotecan 

Yakult Honsha 

Cancer 

Ivermectin 

Merck & Co 

Parasiticide 

Lentinan 

Ajinomoto 

Cancer 

LW-50020 

Sankyo 

Immunomodulation 

Masoprocol 

Access 

Cancer 

Mepartricin 

SPA 

Benign prostatic hyperplasia 

Miglitol 

Bayer 

Diabetes 

Mizoribine 

Asahi Chemical 

Arthritis 

Mycophenolate mofetil 

Hoffman-LaRoche 

Arthritis 

Orlistat 

Hoffman-LaRoche 

Obesity 

Paclitaxel 

Bristol-Myers Squibb 

Cancer 

Pentostatin 

Warner-Lambert 

Leukemia 

Podophyllotoxin 

Nycomed Pharma 

Human papillomavirus 

Policosanol 

D aimer 

Hyperlipidaemia 

Everolimus 

Novartis 

Immunomodulation 

Sirolimus 

American Home Products 

Immunomodulation 

Sizofilan 

Taito 

Cancer, hepatitis-B virus 

Subreum 

OM Pharma 

Arthritis 

Synercid 

Novartis 

Antibiotic 

Tacrolimus 

Fujisawa 

Immunomodulation 

Teicoplanin 

Aventis 

Antibiotic 

Tirilazad mesylate 

Pharmacia & Upjohn 

Subarachnoid haemorrhage 

Topotecan 

GlaxoSmithKline 

Diabetes 

Ukrain 

Nowicky Pharma 

Cancer, HIV/AIDS 

Vinorelbine 

Pierre Fabre 

Cancer 

Voglibose 

Takeda 

Diabetes, obesity 

Z-100 

Zeria 

Immunomodulation 


2 DRUGS AFFECTING THE CENTRAL 
NERVOUS SYSTEM 

2.1 Morphine Alkaloids 

The history of the opium alkaloids is too well 
known to warrant repetition here, but the an¬ 
algesics based on morphine (l)are too impor¬ 
tant to be left out of an account of natural 
products as leads. Thus we shall summarize 
the clinically more important developments 


that have occurred since the isolation of mor¬ 
phine in 1803. Codeine (2) continues to be 
used widely for the treatment of moderate 
pain and, although present in the opium poppy 
(Papaver somniferum), it is normally synthe¬ 
sized in higher yield from morphine (4). 

Other than codeine, the earliest significant 
semisynthetic derivative of morphine is the di¬ 
acetate heroin (3), which is still widely used in 
terminal cancer where its addictiveness is ir- 
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(1) morphine Rj = R 2 = H 

(2) codeine Ri = CH 3 , R 2 = H 

(3) heroin R^ = R 2 = COCH 3 


relevant. Acetylation masks the polar hydroxy 
groups, so that penetration into the central 
nervous system (CNS) is enhanced; hydrolysis 
then occurs to liberate the phenolic hydroxyl, 
giving an active analgesic, and ultimately re¬ 
generates morphine (5).Heroin was thus one 
of the first prodrugs. 

Modifications to the C-ring of morphine are 
legion, but none of the derivatives is free 
from addictive liability, though many have 
been used clinically. N-Demethylation and 
realkylation yield more interesting analogs, 
notably N-allylnormorphine and nalorphine 
(4), which is a morphine antagonist (6). Fur¬ 
ther modification leads to naloxone (5), 
which unlike nalorphine has very little ago¬ 
nist activity (7) and has retained a place in 
therapy for treatment of opiate-induced re¬ 
spiratory depression. Naloxone will also pre¬ 
cipitate withdrawal symptoms in opiate ad¬ 
dicts, thereby facilitating diagnosis. 


J 

I 



Total synthesis of morphine is difficult, but 
analogs lacking the dihydrofuran ring are ac¬ 
cessible (8) from 1-benzylisoquinolines, in 
analogy with the biosynthesis of morphine, to 



give the morphinans (6). The system may be 
simplified even further (9), to give the benzo- 
morphans (7), although neither these nor the 
morphinans have provided the long-sought 
analgesic without addictive properties. 


/ 



( 6 ) morphinan 



HO 

(7) benzomorphan 


A semisynthetic route to morphine ana¬ 
logs was found (10) from thebaine ( 8 ) using 
Diels-Alder reactions in the C-ring. Adducts 
such as (9)have the distinction of enormous 
potency ( 11 ), sufficient to immobilize rhi¬ 
noceroses at moderate dose levels! Unfortu¬ 
nately, the addictive liability runs parallel to 
the increase in analgesic potency, a tendency 
that was partly overcome (12) in the analog 
buprenorphine ( 10 ). 
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( 8 ) thebaine 
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vation that meperidine (pethidine) (12)unex¬ 
pectedly produced a reaction in mice known as 
Straub tail, normally characteristic of the 
morphine series (15). Meperidine itself is still 
used widely in childbirth in the belief that 
there is a lower incidence of respiratory de¬ 
pression in the fetus. The realization that 
4-phenylpiperidines, which are not obvious 
structural analogs of morphine, could give rise 
to useful analgesic effects, led to the synthesis 
of many thousands of derivatives (16), many 
with far greater potency than that of meperi¬ 
dine. Unfortunately, as potency increases so 
do addiction liability and respiratory depres¬ 
sion. 



( 11 ) atropine 



All this work was carried out in ignorance 
of the nature of the natural transmitter(s), 
which subsequently proved to be the peptides 
known as endorphins and their pentapeptide 
fragments, the enkephalins (13). It is perhaps 
significant that vastly improved understand¬ 
ing of the biochemical basis for analgesia and 
the characterization of a family of related re¬ 
ceptors (14), known as 8> k, and p, have so far 
failed to yield any better drugs for the treat¬ 
ment of pain. 

A series of analgesics that were discov¬ 
ered initially in an attempt to obtain smooth 
muscle relaxants based on another natural 
product, atropine ( 11 ), started with the obser- 


(12) pethidine 

2.2 Conotoxins 

Elan Pharmaceuticals is developing SNX-111 
(Ziconotide), the synthetic equivalent of 
a)-Conopeptide-MVIIA, found in the venom of 
the predatory marine snail Conus magus, for 
the treatment of severe pain and ischemia by 
the intrathecal or intravenous routes. The 
peptide has the structure H- 1 Cys-Lys-Gly- 
Lys-Gly-Ala-Lys- 8 Cys-Ser-Arg-Leu-Met-Try- 
Asp- 15 Cys- 16 Cys-Thr-Gly-Ser- 20 Cys-Arg-Ser- 
Gly-Lys- 25 Cys-NH 2 cyclic(l-16),(8-20),(15- 
25)-tris(disulfide), which does not make it an 
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(13) conotoxin analog 


easy target for synthesis and gives it poor dis¬ 
tribution properties in vivo (17). 

SNX-111 blocks N-type calcium channels, 
which are located throughout the CNS on neu¬ 
ronal somata, dendrites, dendritic spines, and 
axon terminals, where they play a major role 
in the regulation of the neurotransmitters as¬ 
sociated with pain transmission and stroke. 
The drive is to discover an orally active, selec¬ 
tive, small-molecule modulator of N-type cal¬ 
cium channels to overcome the disadvantages 
of administration of SNX-111. 

High-throughput screening campaigns have 
resulted in a number of leads being identified; 
whereas others have chosen to modify known 
drugs shown to block N-type channels. Work¬ 
ers at Parke-Davis, however, employed a li¬ 
gand-based approach using the three-dimen¬ 
sional solution structure of the peptide (18). 
Compounds such as (13)were designed where 
key binding motifs are attached to an alkyl- 
phenyl ether scaffold. The compound had an 
IC„ value of 3.3 jllM in a human N-type 
channel assay but showed no selectivity over 
the L-type channel. Structure-activity work 
on the conotoxins has shown that other re¬ 
gions of the peptide, absent in these syn¬ 
thetic ligands, are responsible for channel 
family selectivity (17,18). 

2.3 Cannabinoids 

The plant Cannabis sativa has been used by 
humans for thousands of years, both for the 
effects when ingested and for making rope 
from the fibers in the stem. The major constit¬ 
uent of pharmacologicalinterestis A 9 -tetrahy- 


drocannabinol (14) (THC), which has a multi¬ 
plicity of actions. In animals the effects 
include sedation and apparent hallucinations 
(19), which are similar to the major effects in 
the CNS in humans. There are also cardiovas¬ 
cular effects, notably tachycardia and postural 
hypotension, that can be separated from the 
CNS action, as in the synthetic analog A 6al0a - 
dimethylheptylTHC (15), which has minimal 
CNS activity (20). 



(CH 2 ) 4 CH 3 



(15) 


Given the widespread illicit use of C. sativa, 
it was perhaps inevitable that eventually one 
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or two cancer patients receiving chemother¬ 
apy would dose themselves with their own sed¬ 
ative in the form of marijuana. An unexpected 
blessing from this uncontrolled combination 
was a reduction in the nausea experienced 
during chemotherapy. A variety of anticancer 
agents cause severe nausea and vomiting, in¬ 
cluding nitrogen mustard, adriamycin, 5-aza- 
cytidine, cyclophosphamide, and methotrex¬ 
ate: the unique situation arose in which the 
remedy was discovered by the patients them¬ 
selves (21). Although smoking reefers gives 
rapid absorption and close control of the ef¬ 
fects, smoking is itself carcinogenic and can¬ 
not be recommended to those who are unac¬ 
customed to it; thus, when the physicians in 
charge were made aware of their patients' dis¬ 
covery, they devised a controlled clinical trial 
in which measured doses of THC were dis¬ 
solved in sesame oil and administered in gela¬ 
tin capsules. A placebo was similarly prepared 
for use in a randomized, double-blind, cross¬ 
over experiment (21). The results left no doubt 
that a majority of patients benefited from 
THC pretreatment, even those who had previ¬ 
ously been refractory to the effects of the stan¬ 
dard antiemetics such as prochlorperazine. 
There remained the problem of tachycardia 
associated with THC treatment. The multi¬ 
plicity of effects of THC have led to the syn¬ 
thesis of large numbers of analogs (22), partic¬ 
ularly in the hope of finding non-morphine- 
like analgesics without addictiveness and 
without the other CNS effects of THC. The 
analog nabilone (16) had been shown to exert 
less effect than that of THC on the cardiovas¬ 
cular system, while retaining the mixture of 
CNS actions, including analgesic, anti anxiety, 
and antipsychotic properties (23). When tested 
as an antiemetic, nabilone proved to be superior 
to THC (24) and has been used for this purpose 
for more than 30 years. The first 10 years of 
clinical experience was reviewed (25). 

After the demonstration of THC binding 
sites in the CNS (26), a search for an endoge¬ 
nous ligand produced the long-chain ethanol- 
amine derivative (17) of arachidonic acid, 
known as anandamide (27).Subsequently, the 
glycerol ester of arachidonic acid (18), known 
as 2-AG, was shown to be a more abundant 
endogenous ligand in the brain than anand¬ 
amide (28). Further development has tended 


O 



to concentrate on analogs of the natural li¬ 
gands, notably the methyl derivative of anan¬ 
damide (19), which is resistant to the amide 
hydrolase that terminates the action of anan¬ 
damide itself and the dimethylheptyl analog 
(20) that is traceable to the earlier modifica¬ 
tions to THC (29). Such analogs tend to have 
activity similar to that of THC. 


O 



NcH 2 ) 4 CH 3 


(17) anandamide 



An interesting twist in the tail is provided 
by the observation that anandamide is also a 
ligand for the so-called enigmatic vanilloid re- 
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^(CH 2 ) 4 CH 3 


(19) R-methanandamide 


0 



ceptors, previously characterized through 
their interactions with two other natural 
products, capsaicin (21) and resiniferatoxin 
(22) (30) and responsible for the "hot" sensa¬ 
tion caused by compounds in, for example, 
chilies. A functional vanilloid receptor was 
cloned in 1997 and is activated by heat and 
acid as well as the chemical ligands (30). A 
combination of the anandamide structure 
with a vanilloid motif, as in AM404 (23), en¬ 
hances the anandamide transport inhibitory 
properties (29). The situation is complex from 
the viewpoint of drug design, not least because 
there are two cannabinoid (CB) receptors, 
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plus a hydrolase and a transport protein, in¬ 
terference with any or all of which might pro¬ 
vide new drugs. 




(CH 2 ) 4 CH 3 

(23) AM404 


The cannabinoid acids, which are devoid of 
psychotropic activity, are promising anti-in¬ 
flammatory agents (31) and it is possible that 
the next useful therapeutic agent will come 
from this direction, rather than the sought- 
after analgesic. 



(21) capsaicin 
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2.4 Asperlicin 

Cholecystokinin (CCK) is a peptide hormone, 
present in the gut and CNS; it is one of the 
most abundant peptides in the brain (32,331. 
The whole peptide is composed of 33 amino 
acids, but the C-terminal octapeptide H-Asp- 
Tyr(S0 3 H)-Met-Gly-Trp-Met-Asp-Phe-NH 2 
possesses the full range of activities, sufficient 
for it to be classed as a neurotransmitter (34). 
Specific, high-affinity binding sites have been 
found on mammalian CNS cell membranes 
and in other organs such as pancreas, gall 
bladder, and colon (35). The latter have been 
classed as CCK-A receptors, but the majority 
of CNS receptors were classed as CCK-B, 
based on affinity differences for various ago¬ 
nists and antagonists (36). To confuse the is¬ 
sue slightly, the gastrin receptor in the stom¬ 
ach is closely related to the CCK-B (now 
known as the CCK 2 ) receptor (37) and is stim¬ 
ulated by the C-terminal tetrapeptide of CCK: 
in the periphery, gastrin receptors are the 
same as CCK, receptors (38). 

The effects of CCK on intestinal smooth 
muscle and pancreas are easy to demonstrate 
pharmacologically,unlike the role in the CNS, 
which is a matter for conjecture. It was as¬ 
sumed that the CNS activity must be signifi¬ 
cant, given the abundance of the peptide in the 
brain, and that the discovery of antagonists 
might lead to new drug treatments, as yet un¬ 
specified (39). 

Fishing in microbial broths, using radiore¬ 
ceptors as bait, produced asperlicin (24), the 
first potent, competitive and selective CCK-A 
(CCK-,) antagonist, from a culture medium of 
Aspergillus alliaceus (40). 



855 

Asperlicin is moderately potent, poorly sol¬ 
uble in water, and not bioavailable by the oral 
route (41). When discovered it was also, with 
morphine, one of the very few nonpeptides 
with affinity for a peptide receptor (peptoids 
are discounted in this assessment). It was an 
interesting target for synthetic modification, 
particularly viewed as a benzodiazepine deriv¬ 
ative with potential CNS activity. 

Based on the benzodiazepine nucleus, and 
an overt mimic of diazepam, one of the first 
successful synthetic analogs was L-364,286 
(25), which had potency on CCK-A receptors 
similar to that of asperlicin. Better receptor 
affinity was achieved with 3-amide-substi¬ 
tuted benzazepines: the 2-indolyl derivative 
L-364,718, also known as MK-329 (26), is five 
orders of magnitude more potent than asper¬ 
licin (42) at CCK-A receptors and is a valuable 
pharmacological tool. 

H 




Modification of the 3-amide to give a urea 
linkage as in (27) led to a reduction in CCK-A 
receptor affinity. Importantly, discrimination 
between CCK-A and CCK-B receptors by (27) 
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is governed by the stereochemistry at C3, the 
(S)-enantiomer showing greater affinity for 
CCK-A receptors. The (R)-enantiomer, known 
as L-365,260, prefers CCK-B receptors, antag¬ 
onizes gastrin-stimulated acid secretion in an¬ 
imal models, and, among other CNS effects, 
induces analgesia in primates and displays an¬ 
xiolytic properties (32). 



Further development in this series has very 
substantially improved receptor affinity: YM- 
022 (28)has IC„ 0.05 nM/kg (38). Clinical tri¬ 
als of compounds in this series have been dis¬ 
appointing because of poor bioavailability, but 
the general concept of finding a therapeutic 
agent through antagonism of CCK 2 receptors 
is still viable and it is reported that the num¬ 
ber of patents in this area has increased in the 
last 5 years (43). 



(28) YM-022 


3 NEUROMUSCULAR BLOCKING DRUGS 

3.1 Curare, Decamethonium, 
and Atracurium 

The development and use of muscle relaxants, 
to allow a reduction in the level of anesthesia 
during surgery, follows entirely from studies 
of South American arrow poisons (44)and par¬ 
ticularly from the isolation by King (45) of 
pure D-tubocurarine (29) in the 1930s, from 
tube curare. Another of the South American 
blowpipe poisons, calabash curare, was used 
for similar purposes and developed (46, 47), to 
give alcuronium (30) from the alkaloid 
C-toxiferine 1 (31). Both types of curare para¬ 
lyze skeletal muscle by a similar mechanism, 
antagonizing the effect of acetylcholine at the 
neuromuscular junction (48). 



(32) metocurine R = CH 3 


The muscle-paralyzingcurare alkaloids are 
quaternary salts that are not absorbed when 
taken orally. For surgical procedures they 
must be administered by intravenous injec¬ 
tion, which results in onset of paralysis in at 
most a few minutes: anesthesia is normally 
induced before administration of the muscle 
relaxant (44), which is followed by artificial 
respiration. Although the neuromuscular 
blocking agents are potentially lethal when 
administered alone, in the environment of an 
operating theater they are truly life-saving 
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(31) C-toxiferine 1 R ” CH 3 
(30) alcuronium R =CH 2 CH=CH 2 


drugs that have made a major impact on sur¬ 
vival rates during surgery. 

At the time of King's work in the 1930s 
there were no spectroscopic aids to structure 
elucidation, and it is not surprising that he 
made a small error in the structure assigned to 
D-tubocurarine, believing it to have two qua¬ 
ternary nitrogens, a mistake that was not cor¬ 
rected (49) until 1970. The methylation prod¬ 
uct of D-tubocurarine, known as metocurine 
(32) is a more potent muscle relaxant. It was 
known for a long time as dimethyltubocura- 
rine because of the error in the structure al¬ 
located to compound (29). King's error, in 
assigning a bisquaternary structure to a mol¬ 
ecule with one quaternary and one protonated 
tertiary nitrogen, led to a large number of 
highly active synthetic bisquaternaries. The 
simplest of these was decamethonium (33), 
which was nothing more than two trimethyl- 
ammonium end groups connected with a deca- 
methylene chain. As one of a series with dif¬ 
ferent chain lengths (50), decamethonium 
became the prototype for many more complex 
structures with 10 atoms between the quater¬ 
nary centers, which appeared to be optimal for 


©I 

-N—(CH 2 ) 10 



(33) decamethonium 


binding to the acetylcholine receptor at the 
neuromuscular junction. 

Unlike tubocurarine, decamethonium de¬ 
polarizes the muscle endplate, rendering the 
membrane insensitive to acetylcholine (48). 
The action of tubocurarine is competitive and 
can be overcome with increased concentra¬ 
tions of acetylcholine, brought about by ad¬ 
ministration of an anticholinesterase: the lat¬ 
ter is thus an antidote to tubocurarine, but not 
to decamethonium. Despite the lack of an an¬ 
tidote, decamethonium was used very widely 
for over two decades. One of its disadvantages 
is an overlong duration of action, during 
which time the patient has to be maintained 
on artificial respiration, because the muscle of 
the diaphragm is also susceptible to the ac¬ 
tions of the drug. An early and highly success¬ 
ful attempt (51) to shorten the action of deca¬ 
methonium gave suxamethonium (34), a 
diester formed between succinic acid and two 
molecules of choline, which hydrolyzes rapidly 
in the presence of pseudocholinesterase. 

Tubocurarine suffers from cardiovascular 
side effects induced by direct interactions with 
ganglionic acetylcholine receptors and from 
stimulation of histamine release, so analogs 
have been well worth pursuing. The macrocy- 
clic structure of tubocurarine is a difficult syn¬ 
thetic target, but fortunately ring-opened an¬ 
alogs, such as laudexium (35), have high 
potency and relatively few side effects (52). 
The main problem with (35)is the duration of 
action, which at about 40 min is too long for 
many operations. Two approaches have been 
used to shorten the duration of action. The 
concept of pH-controlled Hofmann elimina¬ 
tion was employed successfully (53) in the de¬ 
sign of atracurium (36), which in clinical use 
(54) has the big advantage that the drug dis¬ 
appears at a constant rate, irrespective of liver 
or kidney function. Some ester hydrolysis con¬ 
tributes to the destruction of atracurium in 
vivo, as might be expected. A slightly later de¬ 
velopment (55) centered on an empirical 
search for structures that would undergo ester 
hydrolysis more rapidly, resulting in mivacu- 
rium (37), which has a slightly shorter dura¬ 
tion of action than that of atracurium, the lat¬ 
ter being about 15-20 min. 
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/X^COOCH 2 CH 2 N(CH 3 )3 


pseudocholinesterase 
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© .COOH © 

(H 3 C) 3 NCH 2 CH 2 OCO HOCH 2 CH 2 N(CH3)3 

decomposition of suxamethonium (34) 



4 ANTICANCER DRUGS 

4.1 Catharanthus (Vinca) Alkaloids 

In 1949 Canadian researchers at the Univer¬ 
sity of Western Ontario began investigating 
the medicinal properties of the rosy periwin¬ 
kle (Catharanthus roseus ), a plant that had 
been used for many years to treat diabetes 
mellitus in the West Indies. Despite finding 
that the plant extract when given orally had 
no effect on blood sugar levels in rats or rab¬ 
bits, the researchers noted that when given 
intravenously, the extract caused the animals 
to succumb to bacterial infection and die. This 
curious observation prompted further studies, 
which showed that the plant extract reduced 
levels of white blood cells, causing granulocy¬ 
topenia and bone marrow damage, toxic ef¬ 
fects that are encountered with many antitu¬ 
mor drugs (56). These findings led the 
Canadian group to isolate an alkaloid fraction 
with potent cytotoxic activity. The active prin¬ 
ciple was eventually purified and became 
known as vinblastine (38), a dimeric indole- 
dihydroindole alkaloid. 

Concurrently, researchers at the Lilly Re¬ 
search Laboratories had been investigating 


extracts of C. roseus and they too had detected 
cytotoxic activity, specifically against acute 
lymphocytic leukemia (57,58). TheU.S. group 
isolated several alkaloids, including vinblas¬ 
tine and another closely related alkaloid, vin¬ 
cristine (39). 

Although many other alkaloids have been 
isolated from C. roseus, only vinblastine and 
vincristine have been developed for clinical 
use. The antiproliferative activity of the two 
compounds is related to their specific interac¬ 
tion with tubulin, thus preventing assembly of 
tubulin into microtubules and arresting cell 
division (59). However, despite this apparent 
identical mechanism of action and their clear 
chemical similarities, vinblastine and vincris¬ 
tine display very different clinical effects. Vin¬ 
blastine, for example, is used to treat 
Hodgkin's disease and metastatic testicular 
tumors, whereas vincristine is used mainly in 
combination with other anticancer drugs for 
the treatment of acute lymphocytic leukemia 
in children. Toxicity profiles are also different, 
in that vinblastine causes bone-marrow de¬ 
pression, whereas peripheral neuropathy of¬ 
ten proves to be dose-limiting in vincristine 
therapy. 
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OCH 3 



(37) mivacurium 


Lilly introduced vinblastine and vincristine 
into the clinic in 1960 and 1963, respectively, 
but this did not preclude the search for im¬ 
proved derivatives. A chemical modification 
program aimed at improving antitumor activ¬ 
ity and reducing toxicity was initiated in 1972 
(60). Concern about the neurotoxicity dis¬ 
played by vincristine, its chemical instability, 
and low natural abundance (0.03 g/kg dried 
plant material) led to vinblastine's being cho¬ 
sen as a template for semisynthetic modifica¬ 


tion. Selective ammonolysis of the ester func¬ 
tion at C-3 and hydrolysis of the adjacent 
acetyl group yielded the desacetyl vinblastine 
amide, vindesine (40). Better yields of vin- 
desine were obtained from the hydrazide (41) 
on treatment with nitrous acid and reacting 
the resultant azide (42) with ammonia. The 
azide (42) proved to be a useful intermediate 
for the preparation of a range of substituted 
amides, although vindesine proved to be the 
derivative of choice, with significant differ- 
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(38) R = CH 3 

(39) R = CHO 
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(41) 


(40) 


OH 



(42) 


ences in the spectrum of antitumor activity 
and toxicity compared to that of the naturally 
occurring alkaloids. Phase I clinical trials 
commenced in 1977 and vindesine has been 
used for the treatment of non-small cell lung 
cancer, lymphoblastic leukemia, and non- 
Hodgkin's lymphomas. In combination with 
cisplatin, vindesine ranks among the foremost 
treatments for non-small cell lung cancer with 
respect to response rate and survival (61). 
Back in the 1950s, the U.S. researchers could 
not have guessed that 30 years on, the demand 
for Catharanthus alkaloids would necessitate 
the processing of around 8000 kg of plant ma¬ 
terial per year (62)! 


4.2 Camptothecin 

Camptothecin (43) was first isolated by Mon¬ 
roe Wall and Mansukh Wani in 1966, after 
ethanolic extracts of Camptotheca acuminata, 
a tree native to China, showed unusual and 
potent antitumor activity (63). Starting with 
19 kg of dried wood and bark, Wall and Wani 
painstakingly purified the principal active 
component with a combination of hot solvent 
extraction, an 11-stage Craig countercurrent 
partition process, silica gel chromatography, 
and crystallization. Camptothecin was charac¬ 
terized as a novel pentacyclic alkaloid, present 
as just 0.01% w/wof the stem bark of C. acumi- 
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nata. Qf particular note was the unusual ac¬ 
tivity that camptothecin displayed in L1210 
and P388 mouse leukemia life-prolongation 
assays. The compound also inhibited the 
growth of solid tumors in vivo and the water- 
soluble sodium salt was progressed to phase II 
clinical trials before being withdrawn because 
of severe bladder toxicity. 
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(43) Camptothecin: R 1 = R 2 = R 3 = H 

(44) 10-hydroxycamptothecin: R 1 = R 2 = H, 

R 3 = OH ' 

(48) 7-ethyl-10-hydroxycamptothecin: R 1 = C 2 H 5 , 
R 2 = H, R 3 = OH 

(46) Topotecan: R 1 = H, R 2 = CH 2 —N(CH 3 ) 2 , 
R 3 =OH 

Interest in camptothecin gained new impe¬ 
tus in 1985, when it was discovered that the 
compound exerts its antitumor activity 
through a novel mechanism of action (64). 
Camptothecin binds to the covalent complex 
formed by topoisomerase I and DNA, which 
initiates DNA replication and thus stabilizes 
the enzyme-DNA complex and prevents cell 
proliferation. The elucidation of the mecha¬ 
nism of action provided a means of evaluating 
camptothecin analogs as topoisomerase inhib¬ 
itors in vitro and efforts then focused on syn¬ 
thesizing water-soluble analogs with broad- 
spectrum antitumor activities. The a-hydroxy 
lactone (ring E) and, in particular, the 20(£)- 
form proved essential for maintaining biolog¬ 



O 


ical activity, but the 10-hydroxy analog (44) 
showed greater activity than that of (43) (65). 
Wall and Wani successfully deployed the 
Friedlander reaction between substituted 
2-aminobenzaldehydesand the tricyclic inter¬ 
mediate (45), to synthesize a variety of ring- 
A-substituted analogs. These studies may 
have prompted SmithKline Beecham (now 
GlaxoSmithKline) to synthesize the water-sol¬ 
uble 10-hydroxycamptothecin analog topote¬ 
can (46) that was first approved in 1996 for the 
treatment of recurrent ovarian cancer and, 2 
years later, for small cell lung cancer (66). Iri- 
notecan (47), developed by Daiichi and Yakult 
Honsha in Japan and marketed by Pharmacia, 
was also approved in 1996 for the treatment of 
advanced colorectal cancer. Irinotecan is inac¬ 
tive as a topoisomerase I inhibitor, but acts as 
a prodrug of the active 7-ethyl-10-hydroxy¬ 
camptothecin (48) (67). 

(47) Irinotecan: R 1 = C 2 H 5 , R 2 = H, 



4.3 Paclitaxel and Docetaxel 

Regarded as the tree of death by the Greeks 
and used to prepare arrow poison by the Celts, 
the yew tree has been associated with death 
and poisoning for centuries (68, 69). The En¬ 
glish yew, Taxus baccata, was used to make 
funeral wreaths and it was believed that one 
could die by merely standing beneath the 
boughs of the tree. 

Yew certainly contains highly toxic metab¬ 
olites and their potency and fast duration of 
action has often made extracts of yew the poi¬ 
son of choice for numerous murders and sui¬ 
cide attempts. It is thus ironic that extracts 
from the Pacific yew, T. brevifolia, after being 



O 


( 45 ) 
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tested in the National Cancer Institute's 
(NCI) screening program during the 1960s, 
yielded what was described (70) as the most 
exciting anticancer compound discovered in 
the previous 20 years; that is, paclitaxel (49) 
(originally given the name taxol by Wall and 
Wani). 



The initial isolation and characterization of 
paclitaxel proved particularly difficult because 
of (l)its very low natural abundance in T. 
breuifolia bark (although this was the best 
known source, the isolated yield was only 
0.02% w/w, equivalent to 650 mg per tree), (2) 
the poor analytical data obtained from the pu¬ 
rified compound, and (3) the failure of pacli¬ 
taxel to give crystals that were suitable for 
X-ray analysis (7 l).The structure of paclitaxel 
was published in 1971 (72), but further biolog¬ 
ical testing continued to be troubled by diffi¬ 
culties. The compound showed only modest in 
vivo activity in various leukemia assays, which 
was no better than that displayed by a number 
of other new compounds at the time. In addi¬ 
tion to the limited supplies of paclitaxel (the 
complexity of the molecule precluded chemical 
synthesis), the compound was very poorly sol¬ 
uble in water, which made formulation diffi¬ 
cult. However, various new assays were devel¬ 
oped in the 1970s, including the murine B16 
melanoma model, in which paclitaxel showed 
very good activity, and another boost came 
when Horwitz et al. (73) discovered that the 
compound prevented cell division by a unique 
mode of action. In contrast to the antimitotic 
vinblastine and podophyllotoxin analogs 
(q.v.), which prevent microtubule assembly, 
paclitaxel inhibits cell division by promoting 
assembly of stable microtubule bundles, 
which leads to cell death. 


Natural Products as Leads for New Pharmaceuticals 

Phase I clinical trials were initiated in 
1983, but these were to proceed at a slow and 
tortuous pace and proved all but disastrous 
when the high levels of oil-based adjuvant 
used to formulate paclitaxel caused severe al¬ 
lergic reactions in many volunteers. Un¬ 
daunted by the formulation problem and 
spurred on by paclitaxel's novel mechanism of 
action, clinicians were able eventually to min¬ 
imize the allergic events and demonstrate use¬ 
ful activity. Phase II clinical trials began in 
1985 despite continuing supply problems, and 
4 years later the program received a signifi¬ 
cant boost when McGuire et al. (74) reported 
good responses from patients suffering from 
refractory ovarian cancer, a disease that kills 
some 12,500 women a year in the United 
States alone. 

In many ways, the development of pacli¬ 
taxel mirrored that of the camptothecin ana¬ 
logs, both being dogged for many years by sup¬ 
ply issues, poor pharmacokinetics, and 
toxicity, but the subsequent uncovering of 
novel mechanisms of action fueled renewed ef¬ 
forts to develop these leads into important 
new anticancer agents (75). 

In 1991 Bristol-Myers Squibb in conjunc¬ 
tion with the NCI agreed to manage the sup¬ 
plies of paclitaxel and were granted a licence to 
further develop the compound. The following 
year the U.S. Federal Drug Administration 
approved paclitaxel for the treatment of ovar¬ 
ian cancer in patients unresponsive to stan¬ 
dard treatments, and in December 1993 ap¬ 
proval was given for the treatment cf 
metastatic breast cancer. 

The sourcing of paclitaxel from T. brevifo- 
lia was a major problem (76) because to treat 
just the groups of patients suffering ovarian 
cancer in the United States would require 
about 25 kg of compound per year, necessitat¬ 
ing the felling of some 38,000 trees (70)! Al¬ 
though the Pacific yew is not a rare tree, it is 
extremely slow growing and such harvesting 
could not be sustained indefinitely. It has been 
estimated that there were enough trees avail¬ 
able to maintain a supply of paclitaxel for only 
2-7 years (77). The isolation of paclitaxel from 
other Taxus species has been investigated at 
length and reasonable quantities have been 
obtained from the needles of several species 
including T. baccata. Using the needles has 





4 Anticancer Drugs 


863 


alleviated the supply problem because they 
can be harvested without damaging the tree. 
However, the needles contain much higher 
quantities of several biosynthetic precursors 
of paclitaxel and two of these, baccatin III (50) 
and 10-desacetylbaccatin III (51) have been 
used to prepare paclitaxel semisynthetically. 
One approach, developed by Potier et al. (78), 
involved acylation of the sterically hindered 
C-13 position of baccatin III with cinnamic 
acid and subsequent double-bond functional¬ 
ization through hydroxyamination, to give 
paclitaxel together with various regio- and ste¬ 
reoisomers. A better approach involved pro¬ 
tection of 10-desacetylbaccatin III as the tri- 
ethylsilyl ether, followed by direct acylation 
with the phenylisoserine derivative (52), giv¬ 
ing paclitaxel in 38% overall yield (79). Fur¬ 
ther improvements were made using less 
sterically demanding acylating reagents; for 
example, acylation with the j3-lactam (53) 
gave paclitaxel in up to 90% yield (80) and this 
may be the preferred method for commercial 
production in the future. 



(50) R = COCHg 

(51) R = H 


EtO"Y^ < \ ,-4Ph 

COPh 

(53) 

These semisynthetic approaches also pro¬ 
vide access to analogs with potential advan¬ 
tages over paclitaxel itself. Structure-activity 
studies have shown that, although the oxetane 
ring appears to be essential for activity, wide 
variation in the nature and stereochemistry of 


the C-13 ester side-chain can be tolerated. 
Thus, the N-t-(butoxycarbonyl)derivative, do- 
cetaxel (54), which appears to be more potent 
than paclitaxel (81) and has better solubility 
characteristics, has been developed and 
launched by Aventisfor the treatment of ovar¬ 
ian, breast, and lung cancers. 



Various "protaxols," designed to release 
paclitaxel in situ under physiological condi¬ 
tions, have been prepared by acylating the 
C-2' hydroxyl group. Nicolaou et al. (82) re¬ 
ported the synthesis of the sulfone (55), which 
is soluble and stable in aqueous media, but is 
able to release paclitaxel rapidly in human 
blood plasma. 



Plant tissue culture (70), microbial fermen¬ 
tation (83), and total synthesis (84, 85) provide 
other possibilities for the production of pacli¬ 
taxel and its derivatives, although it is far 
from certain whether any of them will be com¬ 
mercially viable. 
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1. (C 2 H 6 ) 3 SiCl 

2. CH3COCI 




4.4 Epothilones 

Epothilones A (56) and B (57), 16-membered 
macrocyclic polyketide lactones, were first iso¬ 
lated from the cellulose-degrading myxobac- 
terium Sorangium cellulosum by Hoefle, 
Reichenbach, and coworkers (86) as narrow- 

spectrum antifungal and cytotoxic metabo¬ 
lites. The compounds were then tested by the 

National Cancer Institute in the United States 

and found to be highly active against breast 

and colon cancer cell lines (87). Subsequently, 



(56) epothilone A: X = O, R = H 

(57) epothilone B: X = O, R = CH 3 
(59) BMS 247550: X = NH, R = CH 3 


Bollag et al. (88)at the Merck Research Labo¬ 
ratories discovered that the epothilones stabi¬ 
lize microtubule assembly and thus inhibit 
cell division by the same mechanism as that of 
paclitaxel (see above). This observation, to¬ 
gether with their less complex chemical struc¬ 
ture, increased water solubility, more rapid 
action in vitro, and effectiveness against mul¬ 
tidrug-resistant tumor cells, has prompted 

significant interest in the epothilones as anti¬ 
cancer agents. 

On learning the absolute stereochemistry 
of (56) and (57), three academic research 
groups embarked on the total synthesis of 
the epothilones. Nicolaou, Danishefsky, and 
Schinzer independently adopted successful, 
elegant synthetic approaches involving olefin 
metathesis, macrolactonization, Suzuki cou¬ 
pling, or ester-enolate-aldehyde condensa¬ 
tion (89). Within 3 years of the disclosure of 

their absolute stereochemistry, 17 different 
total syntheses of the natural products were 
reported. These syntheses paved the way for 
the generation of a large number of epothilone 
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analogs for biological evaluation, including the 
use of solid-phase combinatorial approaches. 

The academic groups focused on modifica¬ 
tions around the core macrocyclic lactone, es¬ 
tablishing important structure-activity rela¬ 
tionships, but not improving on the in vitro 
biological activity of the most active natural 
product, epothilone B (57). In vivo biological 
data were comparatively scarce and, although 
one group reported that epothilones B (57) 
and D (58) showed activity in murine tumor 
models, researchers at Bristol-Myers Squibb 
have shown that (58)lacks in vivo activity as a 
result of rapid metabolic inactivation (90). It 
was postulated that esterase-mediated hydro¬ 
lysis of the macrocyclic lactone formed an in¬ 
active ring-opened species and, therefore, ef¬ 
forts were focused on replacing the lactone 
with a more stable macrocyclic lactam moiety. 
Several macrocyclic lactam derivatives were 
synthesized from (57) and (58). Of note was 
the preparation of BMS-247550 (59) in a 
three-step synthesis from epothilone B (57), 
utilizing a novel Pd(0)-catalyzed ring-open¬ 
ing reaction followed by reduction and macro- 
lactamization. BMS-247550 (59), which is in 
phase I clinical trials, retains its activity 
against human cancer cells that are naturally 
insensitive to paclitaxel or that have devel¬ 
oped resistance to paclitaxel, both in vitro and 
in vivo (91). 



4.5 Podophyllotoxin, Etoposide, 
and Teniposide 

The development of the natural constituents 
of Podophyllum Resin into effective semisyn¬ 
thetic and, ultimately, totally synthetic com¬ 
pounds for the treatment of various kinds of 
cancer provides one of the most sustained and 
intriguing stories of drug discovery (92, 93). 


The story has all the classic ingredients, start¬ 
ing with observation and reasoning, extending 
through chance into new areas, and character¬ 
ized throughout by persistence and determi¬ 
nation, particularly when biological activity 
had to be traced to very minor constituents in 
the crude plant extract. 

Podophyllum peltatum (may apple, or 
American mandrake) and P. emodi are. re¬ 
spectively, American and Himalayan plants, 
widely separated geographically but used in 
both places as cathartics in folk medicine (94). 
An alcoholic extract of the rhizome known as 
podophyllin was included in many pharmaco¬ 
poeias for its gastrointestinal effects; it was 
included in the U.S.P., for example, from 1820 
to 1942. At about this time the beneficialeffect 
of podophyllin, applied topically to benign tu¬ 
mors known as condylomata acuminata, was 
demonstrated clinically (95). This usage was 
not inspirational, given that there are records 
of topical application in the treatment of can¬ 
cer by the Penobscot Indians of Maine and, 
subsequently, by various medical practitio¬ 
ners in the United States from the 19th cen¬ 
tury (96). The crude resinous podophyllin is 
an irritant and unpleasant mixture unsuited 
to systemic administration. 

The first chemical constituent was isolated 
from podophyllin in 1880 and named podo¬ 
phyllotoxin (97). A structure was proposed in 
1932 and after some fine-tuning (98) was 
shown to be the lignan (60). As might be ex¬ 
pected, the crude resin contains a variety of 
chemical types, including the flavonols quer¬ 
cetin and kaempferol (99). Although these 
other constituents undoubtedly have biologi¬ 
cal activity, it is the lignans that have received 
most attention and to which we shall devote 
the remainder of this section. 

Chemists at Sandoz in the early 1950s rea¬ 
soned that crude podophyllin might contain 
lignan glycosides with anticancer activity, 
which might be more water soluble and less 
toxic than podophyllotoxin (92). The reason¬ 
ing for the latter is not entirely clear, but in 
the event they proved to be correct in both 
respects. Careful isolation gave podophyllo¬ 
toxin j3-D-gIucopyranoside (61) its 4'-des- 
methyl analog (62) and some less important 
lignans lacking the B-ring hydroxy group 
(100-102). Unfortunately, the sugar deriva- 
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(60) podophyllotoxin 


tives were less active as inhibitors of cell pro¬ 
liferation than were the aglycones, as well as 
less toxic; however, as expected, they were 
much more water soluble (92). While continu¬ 
ing work to isolate more natural lignans, a 
substantial program of structural modifica¬ 
tion of the known compounds was under¬ 
taken, with a view to protecting the glucosides 
from hydrolytic enzymes and also to improve 
cellular uptake. Most of these changes were 
ineffective: the per-acylated derivatives, for 
example, were insoluble in water and had in¬ 
ferior cytostatic effects (103). 



(61) R = CH 3 

(62) R = H 


Condensation of the glucosides with a vari¬ 
ety of aldehydes was more useful, in that not 
all the hydroxy groups were blocked. Despite 


this, water solubility was a problem with the 
podophyllotoxin derivatives (63). Gastrointes¬ 
tinal absorption was greatly improved, how¬ 
ever, as was chemical stability (104), and pos¬ 
itive effects were observed in a few cancer 
patients with the benzylidene derivative (64). 
It was at this point that luck played a hand, 
backed up by a good deal of determination. A 
crude podophyllin fraction, which was simpler 
and cheaper to prepare than pure podophyllin 
glucoside, was also treated with benzaldehyde 
to give a mixture of benzylidene derivatives, 
about 80% of which was compound (64). The 
crude product was found to be more potent 
than compound (64) and subsequently to pos¬ 
sess a different mode of action from that of the 
lead compounds: rather than arresting cells in 
metaphase, cells were prevented from enter¬ 
ing mitosis altogether (105). The crude mix¬ 
ture was marketed for cancer treatment as 
Proresid. 



(63) Ri = H,CH 3 R 2 = various alkyl, aryl 

(64) Ri = CH 3 R 2 = CgHs 


Improved biological assay methods (106) 
indicated the presence of an unknown, highly 
active constituent of Proresid. For example, 
Proresid prolonged the life of mice inoculated 
with L1210 leukemia cells (93), an effect that 
was not observed with the known major con¬ 
stituent. In the early 1960s chromatographic 
and spectroscopic techniques were not as 
highly developed as they are now and more 
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than 2 years' work was required to isolate and 
identify the unknown component of the mix¬ 
ture, which proved to be the 4'-desmethoxy-l- 
epi analog (65) of the podophyllotoxin glu- 
coside adduct (92). Present only in very small 
amounts in the derivatized extract, it was nec¬ 
essary to devise a synthesis from readily avail¬ 
able materials. It was fortunate that the de¬ 
sired 1/3 configuration was readily secured 
from la-hydroxy-4'-desmethylpodophyllo- 
toxin, itself obtained by selective demethyl- 
ation of podophyllotoxin: the remainder of the 
synthesis would now be considered fairly rou¬ 
tine (107). 



(65) R = C 6 H 5 

(67) R = 2-thienyl 

(68) R = CH 3 


Given a large supply of the key intermedi¬ 
ate (66), it was straightforward to prepare a 
number of aldehyde derivatives, resulting in 
analogs with up to a 1000-fold increase in po¬ 
tency (108). The selected adducts were those 
prepared from thiophen-2-aldehyde, giving te- 
niposide (67), and from acetaldehyde, giving 
etoposide (68). Both drugs are of value, etopo- 
side in the treatment of small-cell lung cancer 
and testicular cancer, teniposide in the treat¬ 
ment of lymphomas and leukemias. The thio¬ 
phene derivative is also of use in the treatment 
of brain tumors (93). 

The natural products, podophyllotoxin and 
its congeners, are "spindle poisons" that in¬ 
hibit cell proliferation by binding to tubulin 



OH 

( 66 ) 

and preventing formation of microtubules 
(105). Presumably this effect is sufficient to 
account for the success of podophyllin in the 
treatment of condylomata acuminata, al¬ 
though the crude extract contains many other 
candidates for a contribution to the biological 
activity. As has been described, a very minor 
component of the natural mixture, missing 
the 4' hydroxy group, having the 1/3- instead of 
the la-hydroxy configuration and with this 
hydroxy group conjugated with j3-D-glucose, 
must be treated with an aldehyde to produce 
the highly active and most important deriva¬ 
tives. These derivatives do not bind to tubulin, 
but have been shown to be inhibitors of topo- 
isomerase II, which may account for most of 
the observed biological effects, including 
DNA strand breaks, that lead to anticancer 
activity (109). 

4.6 Marine Sources 

Cytosine arabinoside (69), a synthetic analog 
of the C-nucleosides spongouridine (70) and 
spongothymidine (71) from the sea sponge 
Cryptotheca cripta, was the first and, so far, 
the only marine-derived compound used rou¬ 
tinely as an anticancer agent (110). However, 
a number of chemically diverse natural prod¬ 
ucts from marine sources have been pro¬ 
gressed to clinical trials. The three most ad¬ 
vanced compounds are in phase II trials; 
ecteinascidin-743 (72), a tetrahydroisoquino- 
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(73) Bryostatin 1 



(74) Dolastatin 10 


The discovery that the fused /3-lactam nu¬ 
cleus, 6-aminopenicillanic acid (6-APA) (76), 
could be obtained from cultures of Penicillium 
chrysogenum led to the preparation of new, 
semisynthetic derivatives with improved sta¬ 
bility to gastric acid and p-lactamases, and 
with activity against a wider range of patho¬ 
genic organisms (121).Sheehan (122)showed 
that compound (76) would react readily with 
acid chlorides to form new penicillin deriva¬ 
tives with novel substituents at the 6-position. 
Methicillin (77), with a sterically demanding 
2,6-dimethoxybenzamide side-chain, was the 
first semisynthetic penicillin to show resistance 
to staphylococcal p-lactamases, although the 
compound was still acid labile. Ampicillin (78) 
has an a-aminophenylatamido side-chain and 
displays good activity against Gram-negative or¬ 
ganisms, it is stable to acid and thus can be ad¬ 
ministered orally, although it is susceptible to 
degradation by p-lactamases. Amoxycillin (79) 
differs from ampicillin by the addition of a single 


hydroxy group, but the compound is better ab¬ 
sorbed by the gastrointestinal tract. 

Clavulanic acid (80), isolated from Strepto- 
myces clavuligerus , is similar in structure to 
the penicillins, except oxygen replaces sulfur 
in the five-membered ring (123). Clavulanic 
acid has weak antibacterial activity, but is a 
potent inhibitor of p-lactamases (124). A mix¬ 
ture of clavulanic acid and the /3-lactamase- 
sensitive amoxycillin was introduced in 1981 
as Augmentin and has proved to be an effec¬ 
tive combination to combat /3-lactamase-pro¬ 
ducing bacteria (125). In 2001, 20 years after 
its launch, Augmentin is the best-selling anti¬ 
bacterial worldwide. 

The clinical introduction of the penicillin 
group of antibiotics prompted an intensive 
search for novel antibiotic-producing organ¬ 
isms and Selman Waksman demonstrated the 
value of actinomycetes in this role, discovering 
the aminoglycoside streptomycin (81) from 
Streptomyces griseus in 1943 (126). Pharma- 
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(75) R = COCH 2 Ph 

(76) R = H 


(77) R = 



(78) R = COCHPh 


NH 2 

(79) R = COCH— 


nh 2 


V / 


OH 



ceutical companies also embarked on large 
programs of screening soil samples for antibi¬ 
otic-producing microorganisms (127). Chlor¬ 
amphenicol (82) was isolated from Streptomy- 
ces venezuelae in 1948 and other clinically 
important antibiotics followed: chlortetracy- 
cline (83), neomycin (84), oxytetracyclin (85), 
erythromycin (86), oleandomycin (87), kana- 
mycin ( 88 ), and rifamycin (89). 

In 1948 Giuseppe Brotzu isolated the fun¬ 
gus Cephalosporium acremonium from a wa¬ 
ter sample collected off the coast of Sardinia. 
The culture showed significant antimicrobial 
activity, but Brotzu could not interest the Ital¬ 
ian authorities in his discovery. He then 
turned to a friend in England for help, who 




(81) 


0 2 N 






nhcochci 2 

chchch 2 oh 

OH 


(82) 



(83) R 1 = -Cl, R 2 = -H 
(85) R 1 = -H, R 2 = -OH 


arranged for Howard Florey at Oxford to re¬ 
ceive a sample of the producing culture. Even¬ 
tually, an antibacterial substance was isolated 
and named cephalosporin C (90) (128). The 
compound, which had a structure similar to 
that of the penicillins, except it had a dihy¬ 
dro thiazine ring fused to the /3-lactam core, 
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showed good resistance to j3-lactamases and 
was less toxic than benzylpenicillin. However, 
plans to market the compound were termi¬ 
nated with the introduction of methicillin (see 
above). 

The discovery that the basic structural 
building block of cephalosporin C, that is, 
7-aminocephalosporanic acid (7-ACA) (91), 
could be synthesized led to the preparation of 
numerous cephalosporin derivatives in a sim¬ 
ilar way to the synthesis of penicillins from 
6-aminopenicillanic acid (129,130). Modifica¬ 
tion of the substituent at the 7-position, while 
retaining the 3-acetoxymethyl group, gave 
cephalothin (92), cephacetrile (93), and cepha- 
pirin (94), so-called first-generation cephalo¬ 
sporins with good activity against Gram-posi- 



(87) 



tive bacteria, although the acetyl ester was 
susceptible to degradation by esterases and 
thus limited the duration of action. Replace¬ 
ment of the acetoxy group by other substitu¬ 
ents rendered the products less prone to ester¬ 
ase attack. For example, the pyridinium 
derivative, cephaloridine (95), has a longer du¬ 
ration of action than that of cephalothin. 

The first orally active cephalosporin was 
cephaloglycin(96), which possessed a phenyl- 
glycine substituent in the C-7 side-chain, al¬ 
though the labile 3-acetoxymethyl group was 
retained. Replacing the acetoxy group with a 
proton or chlorine, for example, cephalexin 
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(96) COCHPh R 2 = OCOCH 3 


nh 2 

(97) R 1 = COCHPh 


(89) 

(97), cefadroxil (98), cephradine (99), and ce¬ 
faclor ( 100 ) , extended the duration of action of 
these orally active products. Cefaclor has been 
classified as a second-generation cephalospo¬ 


R 2 = H 


r 2 = h 


rin because it has a wider spectrum of activity, 
which includes Gram-negative bacteria such 
as Haemophilus influenzae . Cephamandole 
( 101 ) and cefuroxime ( 102 ) are parenterally 
administered cephalosporins with similar ac¬ 
tivities against clinically important Gram¬ 
negative bacteria and are also resistant to 
many types of j3-lactamases. 

The newer third-generation cephalospo¬ 
rins, including ceftazidime (103), ceftizoxime 
(104), and ceftriaxone (105), which all contain 
an a-aminothiazolyl group in the C-7 side- 
chain, have been developed for treating spe¬ 
cific pathogens such as Pseudomonas aerugi¬ 
nosa.. Thienamycin (106), isolated from 
Streptomyces cattleya in 1976, represented a 
new class of /3-lactam antibiotics produced by 
bacteria where the sulfur of the penicillin nu¬ 
cleus was replaced by a methylene group 
(131). An N-formylimidoyl derivative, imi- 
penem (107), was the first example from this 
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(103) R 1 = COC 
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new class of carbapenem antibiotics to become 
available for clinical use (132). Imipenem has 
a very broad spectrum of activity against most 
Gram-positive and Gram-negative aerobic and 
anaerobic bacteria. 

Screening bacteria such as Pseudomonas 
acidophila and Chromobacterium uiolacium 
for production of /3-lactam antibiotics resulted 
in the discovery of naturally occurring 
monobactams, which had moderate antimi¬ 
crobial activity (133-135). Side-chain varia- 


R 2 = H 

OH 

R 2 = CH 2 S-^ n 

tions, as developed for the penicillins and 
cephalosporins, led to compounds with im¬ 
proved activity against both Gram-positive 
and Gram-negative bacteria. A derivative con¬ 
taining the a-aminothiazoyl group, a side- 
chain component common to the third-gener¬ 
ation cephalosporins (see above), showed 
specific activity against Gram-negative aero¬ 
bic bacteria, including Pseudomonas spp., and 
was stable to most types of /3-lactamases. The 
compound aztreonam (108) became the first 
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COOH 

(90) R = COCH 2 CH 2 CH 2 CHNH 2 




(108) 

(formerlyStreptomyces erythreus). As a broad- 
spectrum antibiotic erythromycin has proved 
invaluable for the treatment of bacterial infec¬ 
tions in patients with /3-lactam hypersensitiv¬ 
ity and is also the drug of choice in the treat¬ 
ment of infections caused by species of 
Legionella, Mycoplasma, Campylobacter, and 
Bordetella (137). 



COOH 


(107) 


commercially available monobactam and 

showed a mode of action similar to that of the 
other /3-lactam antibiotics by blocking bacte¬ 
rial cell wall synthesis (136). 

5.2 Erythromycin Macrolides 

Erythromycin (109) was isolated, in 1952, 
from a strain of Saccharopolyspora erythraea 



(109) Erythromycin A, R = H 
(114) Clarithromycin, R = CH 3 

Although safe and effective, erythromycin 
is not a perfect antibacterial. The presence of 
hydroxy groups suitably disposed with respect 

to the keto function at C-9 leads to the ,orma" 

drated in stomach acid to give the inactive A s 
analog (111),which may undergo further ring 
closure to give the 9,12-tetrahydrofuran (112) 
that is also inactive (139). The A, derivative 
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(11 l)may be responsible for some gastroin¬ 
testinal disturbance (140). To avoid these 
problems by increasing the stability to acid, 
the 2'-stearate, estolate, and ethylsuccinate 
esters have been prepared (141), but even 
when the tablets are enteric-coated the bio¬ 
availability is erratic and relatively frequent 
dosingisrequired (137). 

An understanding of the acid-catalyzed de¬ 
composition of erythromycin has led to a variety 
of semisynthetic derivatives with improved oral 
bioavailability (142). Reductive amination of 
the 9-keto function gives erythromycylamine, 


which reads with (2-methoxyethoxy)acetalde- 
hyde (143)to give dithromycin. Beckmann rear¬ 
rangement of the 9-oxime followed by reduction 
and methylation (144) gives azithromycin (113), 
which shows good activity against Gram-nega¬ 
tive bacteria, including Haemophilus influen¬ 
zae. An alternative for prevention of cyclization 
between the 9-keto and 6-hydroxy is to mask the 
6-hydroxy group. If the 6-hydroxy is methylated 
(145), the result is clarithromycin (114), which 
like (113), has an improved pharmacokinetic 
profile compared with that of the parent mole¬ 
cule. 
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(113) Azithromycin 


Both azithromycin and clarithromycin 
have been used for various bacterial infections 
for a number of years. Within the last decade, 
resistance has emerged to a range of antibac¬ 
terials, including the macrolides, arising from 
methylation of an adenine in the 23S ribo- 
somal RNA target site, which prevents bind¬ 
ing (146).The invention of the ketolides [e.g., 
telithromycin (115)] overcomes MLS B resis¬ 
tance by removing the L-cladinose moiety at 
position 3: the exposed hydroxyl is also oxi- 
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dised to a ketone (147). The loss of potency 
that would ensue is compensated by two fur¬ 
ther modifications, which improve binding, 
formation of a carbamate at positions 11/12, 
and extension with a heterocycle-substituted 
side-chain. In ABT 773 a similar side-chain is 
placed at position 6, with comparable results 
(147). 

5.3 Streptogramins 

The streptogramins are produced by Strepto- 
myces species and have been classified into two 
groups: Group A are polyunsaturated macro- 
cyclic lactones and Group B are cyclic 
hexadepsipeptides. Both groups bind bacterial 
ribosomes and inhibit protein synthesis at the 
elongation step and they act synergistically 
against many Gram-positive microorganisms. 
However, the naturally occurring strepto¬ 
gramins are poorly soluble in water and this, 
until recently, has limited their use to treat 
bacterial infections. New, water-soluble deriv¬ 
atives have been developed and the semisyn¬ 
thetic dalfopristin (116) and quinupristin 
(117) mixture (Synercid) has been approved 
for the treatment of Gram-positive infections, 
including multidrug-resistant strains of En¬ 
terococcus faecium, Staphylococcus aureus, 
and S. pneumoniae (148). 



(115) Telithromycin 
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5.4 Echinocandins 

The fungal metabolite echinocandin B (118)is 
one of the lipopeptides, in which a cyclic 
hexapeptide is combined with a long-chain 
fatty acid. Echinocandin B inhibits j3-l,3-glu- 
can synthesis and as a result has anti-Candida 
and anti-Pneumocystis carinii activity (149). 
As a group, the echinocandins are not orally 
bioavailable, are haemolytic, and are not very 


water soluble (150), despite the hydrogen¬ 
bonding ability of the polyhydroxylated 
hexapeptide. 



(118) echinocandin B, R = linoleyl 


Synthesis of the cyclic hexapeptide is unat¬ 
tractive for the purpose of securing analogs 
with improved biological activity because of 
the unusual nature of the amino acids used 



(117) Quinupristin 
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and the complex stereochemistry generated by 
the high degree of hydroxylation. However, 
echinocandin B can be produced efficiently by 
fermentation of a culture of Aspergillus nidu- 
lans and then deacylated by fermentation 
with Actinoplanes utahensis (151). The free 
amino group thus exposed can be derivatized 
with a number of active esters. Synthesis of 
the amide from 4-octylbenzoic acid gives cilo- 
fungin (119), which has specifically high po¬ 
tency against Candida albicans and some 
other Candida species (151). 


now in clinical trial and has the major advan¬ 
tage of oral bioavailability (153). Many other 
antifungal peptides are under investigation 
(152). The member of this series that is fur¬ 
thest advanced is caspofungin (MK-991, 
L-743,872)(121), followingits approval by the 
FDA, early in 2001, for the treatment of as¬ 
pergillosis. The two analogs, LY-303366 and 
caspofungin, have been compared against clin¬ 
ical fungal isolates in vitro (154)and the latter 
has been evaluated in immunosuppressed 
mice (155). 


(119) cilofungin, R - 


For systemic use cilofungin had to be given 
intravenously and unfortunately ran into 
problems associated with the cosolvent (PEG) 
(152). A better derivative, LY-303366 (120) is 


6 CARDIOVASCULAR DRUGS 

6.1 Lovastatin, Simvastatin, and Pravastatin 

One of the most significant natural product 
discoveries in the last 25 years has been a fun¬ 
gal secondary metabolite called lovastatin 
(122). Heralded as a major breakthrough in 




OH 


(121) caspofungin 
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the treatment of coronary heart disease (156), 
lovastatin was introduced onto the market by 
Merck in 1987 for the treatment of hypercho¬ 
lesterolemia, a condition marked by elevated 
levels of cholesterol in the blood. 



( 122 ) 

Lovastatin works by inhibiting 3-hydroxy- 
3-methylglutary 1 coenzyme A (HMG-CoA) re¬ 
ductase, a key rate-limitingenzyme in the cho¬ 
lesterol biosynthetic pathway. However, the 
first specific inhibitors of this enzyme were 
discovered several years earlier by Endo et al. 
at Sankyo (157). The compounds, which are 
structurally related to lovastatin, were iso¬ 
lated from Penicillium citrinum and shown to 
block cholesterol synthesis in rats and lower 
cholesterol levels in the blood. Development of 
the most active compound, designated ML-236B 
(123), is believed to have been curtailed because 
of toxicity problems (158). 



(123) 

Brown et al. at Beechams also reported 
the isolation of (123), but as a metabolite 
from Penicillium brevicompactum (159). The 


group, naming the compound compactin, re¬ 
ported its antifungal activity but failed to re¬ 
veal its mode of action as an inhibitor of HMG- 
CoA reductase. The search for naturally 
occurring inhibitors of HMG-CoA reductase 
gained pace and after spending several years 
developing appropriate screens, Merck found 
during only the second week of testing a cul¬ 
ture of Aspergillus terreus that displayed in¬ 
teresting inhibitory activity (160). In Febru¬ 
ary 1979 the active component, lovastatin 
(mevinolin), was isolated and characterized 
(161), and in November the following year 
Merck was granted patent protection in the 
United States. Although lovastatin proved to 
be identical to monocolin K, a metabolite iso¬ 
lated earlier from Monasus ruber (162), the 
chemical structure of the latter compound had 
not been reported, whereas Merck filed for 
patent protection giving complete structural 
details for lovastatin. 

The discovery of compactin and lovastatin 
prompted efforts to develop derivatives with 
improved biological properties (163, 164). 
Modification of the methylbutyryl side chain 
of lovastatin led to a series of new ester deriv¬ 
atives with varying potency and, in particular, 
introduction of an additional methyl group a 
to the carbonyl gave a compound with 2.5 
times the intrinsic enzyme activity of lova¬ 
statin (165). The new derivative, named sim¬ 
vastatin (124), was the second HMG-CoA re¬ 
ductase inhibitor to be marketed by Merck. 
Both lovastatin and simvastatin are prodrugs 
and are hydrolyzed to their active open-chain 
dihydroxy acid forms in the liver (166). A third 
compound, pravastatin (125), launched by 
Sankyo and Squibb in 1989, is the open hy- 
droxyacid form of compactin that was first 
identified as a urinary metabolite in dogs. 
Pravastatin is produced by microbial biotrans¬ 
formation of compactin. 

The HMG-CoA reductase inhibitors de¬ 
scribed above bind to two active sites on the 
enzyme: the hydroxymethylglutaryl binding 
domain and an adjacent hydrophobic pocket to 
which the decalin moiety binds (167).The rec¬ 
ognition that the ring-opened hydroxy acids 
resemble mevalonic acid and that the decalin 
moiety could be replaced by 4-fluorophenyl- 
substituted heterocycles led to the launch of 
several new products including fiuvastatin 
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(125) 


(126), the ill-fated cerivastatin (127), and the 
so-called turbo statin atorvastatin (128). Al¬ 
though cerivastatin was withdrawn from the 
market in 2001 because of fatal adverse drug- 
drug interactions, the "statins" remain one of 
the fastest growing segments of the pharma¬ 
ceutical industry. The latest member of this 
group of cholesterol-lowering drugs, Astra- 



(126) Fluvastatin 



(W ch 3 

(127) Cerivastatin 



(129) Rosuvastatin 
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Zeneca's rosuvastatin (129), is due to be 
launched in 2002 and is forecast to achieve 
sales of US $2.8 billion by 2005 (168). 

6.2 Teprotide and Captopril 

While studying the physiological effects of 
snake poisoning, Ferreira (169) discovered 
that specific components in the venom of the 
pit viper Bothrops jararaca inhibited degrada¬ 
tion of the peptide bradykinin and potentiated 
its hypotensive action. The "potentiating fac¬ 
tors" proved to be a family of peptides that 
worked by inhibitingthe dipeptidylcarboxypep- 
tidase, angiotensin-converting enzyme (ACE) 
(170,171). In addition to catalyzing the degra¬ 
dation of bradykinin, ACE also catalyzes the 
conversion of human prohormone, angiotensin 
1, to the potent vasoconstrictor octapeptide, an¬ 
giotensin II. However, the significanceof ACE in 
the pathogenesis of hypertension was not fully 
appreciated until the 1970s after Ondetti et al. 
(172) had first isolated and then synthesized 
the naturally occurring nonapeptide, tepro¬ 
tide (130). The compound proved to be a spe¬ 
cific potent inhibitor of ACE and showed ex¬ 
cellent antihypertensive properties in clinical 
trials, although its use was limited by the lack 
of oral activity. 

Pyr - Trp -Pro—Ar g—Pro—Gin—He—Pro - Pro 

(130) 

The discovery of teprotide led to a search 
for new, specific, orally active ACE inhibitors. 
Ondetti et al. (172) proposed a hypothetical 
model of the active site of ACE, based on anal¬ 
ogy with pancreatic carboxypeptidase A, and 
used it to predict and design compounds that 
would occupy the carboxy-terminal binding 
site of the enzyme. Carboxyalkanoyland mer- 
captoalkanoyl derivatives of proline were 
found to act as potent, specific inhibitors of 
ACE and 2-D-methyl-3-mercaptopropanoyl-L- 
proline (131) (captopril) was developed and 
launched in 1981 as an orally active treatment 
for patients with severe or advanced hyperten¬ 
sion. Captopril, modeled on the biologically ac¬ 
tive peptides found in the venom of the pit 
viper, made an important contribution to the 
understanding of hypertension and paved the 


way for other ACE inhibitors, such as enala- 
pril (132) and lisinopril, which have had a ma¬ 
jor impact on the treatment of cardiovascular 
disease (173). 




6.3 Adrenaline, Propranolol, and Atenolol 

The true clinical potential of /3-adrenoceptor 
blocking agents for treating angina, atrial fi¬ 
brillation, and tachycardias was first recog¬ 
nized by James Black and colleagues at ICI 
(174). Black noted a report from Neil Moran of 
Emory University in 1958, showing that di- 
chloroisoprenaline antagonized the effects of 
adrenaline on heart rate and muscle tension. 
The first effective /3-adrenoceptor blocker, 
pronethalol (133), was synthesized 2 years 
later by the ICI group and marketed for lim¬ 
ited use in 1963. Toxicity problems soon led 
pronethalol to be replaced by the 1-naphthyl 
analog, propranolol (134), which became the 
first /3-adrenoceptor antagonist approved for 
general use, being more potent and yet devoid 
of the partial agonist or intrinsic sympathomi¬ 
metic activity shown by many other analogs. 
Compounds with improved selectivity for the 
/3-adrenoceptor of cardiac muscle (/3-1-adreno- 

OH 
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ceptor blockers) were to follow, including 
atenolol (135), which became the most fre¬ 
quently prescribed j3-blocker and one of the 
best-selling drugs of the time. 



(134) Propranolol 



(135) Atenolol 


6.4 Dicoumarol and Warfarin 

Sweet clover has a long history of medicinal 
use, often as an antiflammatory or analgesic 
preparation in the form of ointments and 
poultices. Melilotus officinalis (yellow sweet 
clover, or ribbed melilot) was reputed to have 
been a favorite herbal treatment used by King 
Henry VIII of England and the plant is still 
referred to as King's Clover in some publica¬ 
tions (175). 

The plant flourishes in poor soil and was 
cultivated extensively in Europe for cattle fod¬ 
der and for soil improvement. In the early 
1920s M officinalis was planted on the prai¬ 
ries of North Dakota and Alberta, Canada, but 
with disastrous consequences. Soon cattle and 
sheep throughout these regions began literally 
bleeding to death. The mysterious hemor¬ 
rhagic disease was traced to clover fodder that 
had not been stored properly and had become 
"spoiled," or moldy. However, the insolubility 
of the anticoagulant component and the diffi¬ 
culty of assaying extracts for biological activ¬ 
ity made the task of isolating the active prin¬ 
cipal component intractable (176). It took 
almost 20 years before the compound was 
identified as 3,3'-methylenebis(4-hydroxycou- 
marin) (136), an oxidative degradation metab¬ 


olite of coumarin (137), itself a common com¬ 
ponent of Melilotus sp. Soon after the 
compound had been identified, trials were ini¬ 
tiated that confirmed the oral anticoagulant 
activity in humans and in 1942 it was mar¬ 
keted under the name dicoumarol (177). The 
compound had a slow, erratic onset of action 
and efforts were initiated to prepare synthetic 
analogs that acted faster and had longer dura¬ 
tion of action. A 4-hydroxycoumarin residue, 
substituted at the 3-position, proved essential 
for biological activity and in 1948, after syn¬ 
thesizing over 150 compounds, a 4-hydroxy- 
coumarin derivative that was longer acting 
and more potent than dicoumarol was selected 
not for clinical use, but as a rodenticide for 
development by the Wisconsin Alumni Re¬ 
search Foundation! The compound (138), 


OH OH 



(136) 



(138) 


named warfarin (an acronym derived from the 
name of the institute coupled with “arin” from 
coumarin), became a household name for rat 
poison. Concern over the use of oral antico¬ 
agulants and the inherent risk of hemor¬ 
rhage inhibited the development of warfarin 
as a therapeutic agent. However, in 1951, a 
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U.S.Army cadet unsuccessfully attempted 
to commit suicide by taking massive doses of 
the compound. The incident prompted fur¬ 
ther clinical trials that resulted in warfarin 
being used as the anticoagulant of choice for 
prevention of thromboembolic disease (177). 

The mode of action of the coumarin antico¬ 
agulants involves blocking the regeneration of 
reduced vitamin K and induces a state of func¬ 
tional vitamin K deficiency, thus interfering 
with the blood-clotting mechanism (178). 

7 ANTIASTHMA DRUGS 

7.1 Khellin and Sodium Cromoglycate 

The toothpick plant, Ammi visnaga , had been 
used for centuries in Egypt as an antispas- 
modic agent to treat renal colic and ureteral 
spasm. In 1879 one of the plant's main constit¬ 
uents was isolated, crystallized, and named 
khellin (139) (179). Subsequently, the pure 
compound was shown to relax smooth muscle 
and in 1938 the chemical structure was char¬ 
acterized as a chromone derivative (180). In 
1945 a medical technician took khellin to treat 
renal colic and found instead that it acted as a 
potent coronary vasodilator and relieved his 
angina (181). This chance discovery, together 
with earlier observations, led to khellin being 
used as a coronary artery vasodilator and for 
treating bronchial asthma (182). However, its 
clinical use was severely limited by some un¬ 
pleasant gastrointestinal side effects. 



(139) 

Five years later, a small British pharma¬ 
ceutical company, called Benger Laboratories, 
initiated a program to synthesize khellin ana¬ 
logs as potential bronchodilators for treating 
asthma, and had prepared a series of com¬ 
pounds that relaxed guinea pig bronchial 


smooth muscle and protected the animals 
against allergen-inducedbronchospasm (183). 

A clinical pharmacologiston Benger's staff, 
who suffered from chronic asthma, questioned 
the validity of the animal model and decided 
instead to test the compounds on himself. He 
then prepared a "soup" of guinea pig fur, in¬ 
haled the vapors to induce a reproducible 
asthma attack, and assessed the effects of the 
synthesized khellin derivatives. Many of the 
compounds first prepared were insoluble in 
water and caused nausea and other unpleas¬ 
ant side effects when taken orally. This led to 
the test compounds being formulated as aero¬ 
sol sprays and in 1958, an aerosol preparation 
of a chromone-2-carboxylic acid derivative 
(140) was found to exert a protectant effect, 
albeit short lived, against bronchial allergen 
challenge without showing the bronchodilator 
activity seen with other compounds. The com¬ 
pound was completely inactive in the guinea 
pig asthma model and afforded its protectant 
effect in humans only when inhaled as an 
aerosol. 



About two new compounds were tested 
each week and in 1965, after synthesizing 
some 670 analogs, a bischromone was pre¬ 
pared that gave good protection, even when 
inhaled up to 6 h before bronchial allergen 
challenge (184). The compound sodium cro¬ 
moglycate (141) was obtained by condensing 
diethyl oxalate with the bis(hydroxy acetophe¬ 
none) (142) and cyclizing the resultant 
bis(2,4-dioxobutyric acid) ester (143) under 
acidic conditions (185).The essential chemical 
features required for activity appeared to be 
the coplanarity of the chromone nuclei, the 
flexible dioxyalkyl link, and the carboxyl 
groups in the 2-positions. It is believed to act 
by stabilizing tissue mast cells against degran¬ 
ulation, thereby preventing release of inflam¬ 
matory mediators (186). 
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(141) 


Sodium cromoglycate entered clinical trials 
in 1967 and emerged to become a first-line pro¬ 
phylactic treatment for bronchial asthma. 

The coronary dilator properties of khellin 
have not been ignored and at least one suc¬ 
cessful program was initiated to prepare ana¬ 
logs for testing as potential antiangina drugs 
(187, 188). Benziodarone (144) was the first 
useful compound to emerge from the Labaz 
laboratories in Belgium based on the benzofu- 
ran ring system. However, the compound 
caused hepatotoxicity in man and was soon 
superseded by amiodarone (145), a more po¬ 
tent coronary dilator for treating angina. In 
1970 the first report of antiarrhythmic activ¬ 
ity in the clinic was published (189) and ami¬ 
odarone became established for prophylactic 



control of supraventricular and ventricular 
arrhythmias during the 1980s (188). 

7.2 Ephedrine, Isoprenaline, and Salbutamol 

The Chinese have been using a plant extract 
known as ma huang to treat asthma and hay 
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(145) amiodarone 


fever for thousands of years. The extract is 
prepared from several species of Ephedra, a 
small leafless shrub found in China. Following 
experiments at the Peking Union Medical Col¬ 
lege and then at the University of Pennsylva¬ 
nia and the Mayo Clinic in the United States, 
the active ingredient, ephedrine (146), was in¬ 
troduced into Western medicine in 1926 as an 
orally active bronchodilator for the treatment 
of acute asthma (190,191). 

OH 

NHCH 3 
CR 3 

(146) 



Ephedrine is related to another natural 
product that has been used to treat asthma, 
that is, the adrenal hormone adrenaline (147) 
(epinephrine). Adrenaline is a potent agonist 
of both a - and j3-adrenoceptors and thus pro¬ 
duces arterial hypertension as an undesirable 
side effect. In 1951 a synthetic alternative, iso- 
prenaline (148), was introduced and for al¬ 
most 20 years it was considered the drug of 
choice for treating bronchospasm associated 
with acute asthmatic attack (191). Isoprena- 
line is a specific j3-adrenoceptor agonist and, 
although it has no vasoconstrictor activity, the 
compound does have marked cardiac stimu¬ 
lant properties and a short duration of action. 
Ahlquist's concept (192) of two types of adre¬ 
noceptor was developed further by Lands et al. 
(193), who established the existence of jSj- and 
jS 2 -adrenoceptor subtypes. Clear structure-ac¬ 
tivity relationships emerged with the prepara¬ 
tion of compounds related to adrenaline and 
ephedrine; the basic requirement for /3-adre- 
noceptor agonist activity was an aromatic ring 


substituted by an ethanolamine side-chain. 
The branched methyl substituent on the side- 
chain was associated with prolonged duration 
of action (i.e., ephedrine), whereas aromatic 
hydroxylation (in isoprenaline) prevented 
penetration across the blood-brain barrier 
and thus prevented stimulation of the CNS 
(191). However, 1,2-dihydroxy substituents 
were found to promote enzymic degradation, 
and replacement of the 3-hydroxy group by a 
hydroxymethyl substituent was required to 
extend the duration of action. In 1969 salbu- 
tamol (149) was launched by Glaxo as a long¬ 
er-lasting, selective &-adrenoceptor agonist 
for the treatment of bronchial asthma (194) 
and, recently, a lipophilic ether analog, salme- 
terol (150), was introduced with an even longer 
duration of action that has potential advantage 
in the prevention of nocturnal asthma. 


OH 


nhch 3 



(147) 



Despite the many chemical alterations that 
have been carried out on the phenylethano- 
lamine "template," the key chemical features 
associated with modern /3-agonists can be seen 
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(150) 


to have originated from the naturally occur¬ 
ring compounds, adrenaline and ephedrine. 

7.3 Contignasterol 

The use of inhaled corticosteroids such as flu¬ 
ticasone propionate to treat asthma and rhini¬ 
tis has been well documented and will not be 
repeated here. Less well known is an unusual, 
highly oxygenated marine-derived steroid 
isolated from the sponge Petrosia contignata 
that possesses a unique cyclic hemiacyl side- 
chain (151). The compound was isolated by 
Andersen and coworkers (195) at the Univer¬ 
sity of British Columbia and found to possess 
anti-inflammatory properties in vivo. Conti¬ 
gnasterol is being developed by Inflazyme, in 
collaboration with Aventis, for the treatment 
of asthma and other inflammatory diseases 
and has progressed to phase II clinical trials. 


OH 



8 ANTIPARASITIC DRUGS 

8.1 Artemisinin, Artemether, and Arteether 

Artemisia annua (sweet wormwood, qing hao) 
has been used in Chinese medicine for well over 
1000 years. The earliest recommendation is for 
the treatment of hemorrhoids, but there is a 
written record of use in fevers dated 340 A.D. 
Modem development dates from the isolation of 
a highly active antimalarial, artemisinin (qing- 
haosu), in 1972, and has been carried out almost 
entirely in China. Much of the original literature 
is therefore in Chinese, but there is an excellent 
review on qinghaosu by Trigg (196) and an ac¬ 
count of the uses of A annua (197).This section 
is largely a summary of these two articles. 

Artemisinin (152) is a sesquiterpene lac¬ 
tone with an unusual peroxide bridge. One of 
the earliest modifications involved catalytic 
reduction of the peroxide, resulting in loss of 
one oxygen and total loss of antimalarial activ¬ 
ity (196) in the adduct (153). The role of the 
peroxide bridge in producing antimalarial ef¬ 
fects was not fully understood, but it appeared 
essential for activity, so much of the early 
work on analogs conserved this structural fea¬ 
ture as an empirical finding. The mechanism 




(151) Contignasterol 


(153) 
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of action of artemisinin has since been eluci¬ 
dated (198, 199), although it is not without 
controversy (200, 201). The drug has a high 
affinity for hemozoin, a storage form of hemin 
that is retained by the parasite after digestion 
of hemoglobin, leading to a highly selective ac¬ 
cumulation of the drug by the parasite. Arte¬ 
misinin then decomposes in the presence of 
iron, probably from the hemozoin, and re¬ 
leases free radicals, which kill the parasite. 
The peroxide bridge is therefore a crucial part 
of the drug molecule, as was suspected from 
structure-activity studies. Elucidation of the 
mechanism of action has led to the synthesis of 
a range of simple analogs capable of iron-cat¬ 
alyzed decomposition, some of which have 
good antimalarial activity (202). 

In retrospect, it is not surprising that the 
peroxide-bridged compound (154), isolated 
from Artabotrys uncinatus, also has antima¬ 
larial activity (197). Because peroxides of this 
kind are likely to be formed from a variety of 
precursors in dried plant material (see below), 
there may well be many more antimalarials of 
this kind to be found. 


OH 



(154) 

Artemisinin is an excellent antimalarial, 
approximately equal in potency to chloro- 
quine, with a good therapeutic index except on 
the fetus. The preparation of semisynthetic 
derivatives has been stimulated primarily by a 
requirement for improved solubility because 
artemisinin is relatively insoluble in both wa¬ 
ter and oil. 

Reduction of (152) with sodium borohy- 
dride occurs at the lactone carbonyl, leaving 
the peroxide intact (196, 197). The resulting 
cyclic hemiacetal, dihydroartemisinin (155), 
which is a more potent antimalarial than the 
parent compound, shows typical acetal reac¬ 


tivity. In the presence of acid, a highly reactive 
carbocation intermediate allows S N l-type 
substitution with a variety of nucleophiles. 
For example, boron trifluoride catalyzes reac¬ 
tions with methanol and ethanol to give arte- 
mether (156) and arteether (157), respec¬ 
tively, two of the most important derivatives 
(196). Both are more potent than the parent 
compound and have improved solubility in oil. 
Artemether has been chosen for development 
in the West under the name Paluther. 



(156) R = CH 3 artemether 

(157) R = CH 2 CH 3 arteether 

(158) R = COCH 2 CH 2 COONa sodium artesunate 


Water solubility can be greatly improved by 
the standard ploy of esterification with suc¬ 
cinic acid and conversion to the sodium salt. 
Applied to compound (155), this technique 
gives sodium artesunate (158), a water-solu¬ 
ble prodrug that may be given intravenously 
(196). It may be assumed that hydrolysis oc¬ 
curs in vivo to give back (155) as the active 
antimalarial because (156) has been shown to 
be unstable in aqueous solution and because 
analogous carboxylic acids with a nonhydro- 
lyzable ether link are relatively inactive. 

There are two reasons for the great interest 
being shown in artemisinin and its deriva¬ 
tives. First, there is little cross resistance with 
Plasmodium falciparum between the mem¬ 
bers of this series and the quinoline-based an¬ 
timalarials like chloroquine (203).On the con¬ 
trary, significant potentiation of effect is 
observed in combination with chloroquine an¬ 
alogs such as mefloquine (204). Second, the 
high lipid solubility of, for example, arte¬ 
mether ensures rapid penetration into the 
CNS, so these sesquiterpene lactones are first- 
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line drugs for the treatment of cerebral ma¬ 
laria caused by P. falciparum (197), which is 
otherwise fatal. 

It seems highly likely (205) that most of the 
artemisinin found in dried plant material is 
formed by autoxidation after the death of the 
plant. From the medicinal chemist's point of 
view this is unimportant, but some plant bio¬ 
chemists might have doubts about the descrip¬ 
tion of artemisinin as a "natural product." In 
our view, air drying in sunlight is a natural, 
although not a botanical, process. It is proba¬ 
ble that many other plant-derived peroxides 
are formed in a similar way. 

Whole plant extracts often show promising 
activity that may not be traceable to single 
components. This is obviously not true of Ar¬ 
temisia annua extracts, but it is interesting to 
note that other constituents, notably me- 
thoxylated flavones, have potentiating effects 
on the antimalarial activity of artemisinin 
(206). 

The reported effect of artemisinin on sys¬ 
temic lupus erythematosus (196)is intriguing, 
given the history of use of quinine-type anti- 
malarials in this disease. 

8.2 Quinine, Chloroquine, and Mefloquine 

The use of Cinchona bark (e.g., Cinchona suc- 
cirubra) by South American indians to treat 
fevers and the subsequent importation of the 
bark into Europe by Jesuit priests in the 17th 
century is well known (207). At that time ma¬ 
laria was widespread, even as far north as 
eastern Scotland, and there was no effective 
treatment for "the ague." Although quinine 
(159) is not very potent or long acting, a good 
sample of Cinchona bark contains about 5% of 
the alkaloid (208). This high concentration 
permitted genuinely therapeutic doses of bark 
to be given and allowed the pure alkaloid to be 
isolated (209) as early as 1820. During the 
next 100 years quinine was the only effective 
treatment for malaria known to Europeans. 
Without quinine, life in the tropics was impos¬ 
sible for those without natural immunity to 
malaria. "One thing that was compulsory was 
the taking of five grains of quinine a 
day.. .. And if you didn't take it and got ill 
your salary was liable to be stopped" (210). 
Supplies of quinine to Europe were threatened 


Natural Products as Leads for New Pharmaceuticals 

during World War I, stimulating a major pro¬ 
gram of research into synthetic analogs. 



The chemical techniques available to chem¬ 
ists in the period 1820-1920, although im¬ 
proving rapidly, did not allow a structure to be 
proposed for quinine with any confidence: the 
first completely correct proposal (211) came in 
1922 and was finally confirmed by total syn¬ 
thesis (212) as late as 1945. However, part 
structures were known, such as the 6-me- 
thoxyquinoline moiety, from long before, and 
were sufficient to allow the synthesis of mim¬ 
ics. The first clinically successful mimics were 
the 8-aminoquinolines. 

In the early years of the 20th century, syn¬ 
thetic organic chemistry was a young disqi- 
pline, largely governed by empirical rules. 
Progress toward synthetic analogs of complex 
natural structures was governed as much by 
synthetic feasibility as by a desire for close 
mimicry. The first quinine analogs were, 
therefore, a combination of the accessible 
6-methoxyquinoline part of the quinine struc¬ 
ture, with elements of the first successful an¬ 
timicrobial agents, such as 9-aminoacridine. 
Nitration followed by reduction could be used 
to generate a number of new molecules from a 
variety of parent heterocycles. It is recorded 
(213) that 4-, 6-, and 8-aminoquinolines have 
antimalarial properties and, quite extraordi¬ 
narily, two of these chemical classes are still 
used today, have quite different uses as anti- 
malarials, and quite possibly have different 
modes of action. 

The first of the 8-aminoauinolines to be in- 
traduced into medicine was pamaquine (160), 
not long after World War I (214). Despite 
greater toxicity than that of quinine, this class 
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of drugs was found to have radical curative 
ability against the relapsing malarias. Several 
hundred analogs were tested during World 
War II and of these, primaquine (161) sur¬ 
vives to the present day for short-term use as a 
radical curative (215). 



(160) pamaquine 





(161) primaquine 

Quinacrine (162) is an obvious embodi¬ 
ment of the principle outlined above; as a de¬ 
rivative of both quinine and 9-aminoacridine 
it combined a known antimalarial with a 
known antimicrobial. The result was a useful, 
relatively nontoxic antimalarial, although it 
stained the skin and eyeballs yellow (216).De- 
spite this side effect and a high incidence of 
gastrointestinal disturbance, quinacrine was 
widely used during World War II by European 
troops in East Asia. The availability of the re¬ 
sults of medicinal chemistry research to both 
sides in wartime is a curious feature of anti¬ 
malarial development, highlighted below. 

ch 2 ch 3 

N 

n ch 2 ch 3 


(162) quinacrine (mepacrine) 



As has been explained, the major stimulus 
for research into synthetic antimalarials was 
not so much the therapeutic inadequacy of 
quinine as the potential lack of availability in 
times of social upheaval. During World War II, 
the United States encouraged the planting of 
Cinchona in Costa Rica, Peru, and Ecuador 
(216). The total synthesis of quinine was too 
difficult in the 1940s and is unlikely to become 
economically viable even in the new millen¬ 
nium. This problem was partly overcome with 
quinacrine, which was used widely in World 
War II, although quinacrine has the defects 
described above. The conceptual derivation of 
chloroquine (163) from quinacrine is obvious 
and apparently happened twice, in Germany 
and the United States, the latter about 10 
years after the Germans had discarded the 
drug as being too toxic! The story of the redis¬ 
covery of chloroquine is fascinating, as an ac¬ 
count of human muddle and misjudgment, fi¬ 
nally leading to an extraordinarily valuable 
drug (216). 



CH 2 CH 3 

N x 

ch 2 ch 3 


(163) chloroquine 


Over decades of sublethal exposure the re¬ 
sistance of all types of malaria has increased to 
a point where chloroquine no longer offers cer¬ 
tain protection (217). With the partial excep¬ 
tion of quinine and dihydroquinine (218), re¬ 
sistance to antimalarials had reached the 
stage at the time of the Vietnam war where 
more research was required. The development 
of mefloquine (164) was a continuation of the 
World War II effort, with a gap of about 20 
years. Resistance to chloroquine had devel¬ 
oped widely during that period, but surpris¬ 
ingly less so to quinine, given the obvious sim¬ 
ilarities in structure. This observation 
stimulated a reappraisal of quinolines, known 
as quinoline methanols, which bear a hydroxy 
group on the a-carbon of a substituent at- 
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tached to the 4-position (219). Up to 1944, a 
total of 177 quinoline methanols had been syn¬ 
thesized and tested, resulting in one com¬ 
pound (165) with activity superior to that of 
quinine. In human volunteers there was a 
high incidence of phototoxicity associated 
with (165), so research on quinoline meth¬ 
anols in 1944 had ceased in favor of the 
4-amino series, which included chloroquine. 
Reappraisal of about 100 of the World War II 
compounds confirmed the high activity and 
phototoxicity of (165) and also showed the 
high potency of an analog (166), which had 
reduced phototoxicity (219). These data, to¬ 
gether with results from about 200 newer 
compounds, fostered the belief that phototox¬ 
icity was separable from antimalarial activity. 
Extensive evaluation of (166) in humans with 
chloroquine-resistant Plasmodium falcipa¬ 
rum infections showed promise, but with a sig¬ 
nificant incidence of toxic reactions; the dose 
required was also inconveniently large. 

Two hypotheses concerning the effect of 
the 2-phenyl substituent were proposed. One 
was that metabolic oxidation was blocked at 



(164) mefloquine 




this position, so that duration of action was 
prolonged, which was considered desirable. 
Second, the UV chromophore was enlarged, 
which would increase the likelihood of drug- 
induced photosensitivity. The phenyl sub¬ 
stituent was thus replaced by trifluoromethyl 
in the 2-position (220). Before the first such 
derivatives were tested, further analogs were 
prepared with an additional trifluoromethyl 
group on the benzene ring. This was serendip¬ 
itous because the first series of 2-trifluorom- 
ethyl analogs had low potency and were also 
photosensitizing. The series with two triflu¬ 
oromethyl groups, one at position 2 and an¬ 
other in the 6-, 7-, or 8-position were all potent 
and free from phototoxicity (221). The most 
potent was mefloquine (164), a very successful 
drug but one that produces unacceptable CNS 
effects in a small proportion of users ( 222 ); 
parasite resistance has also been observed in 
parts of Southeast Asia (217). There is now a 
serious attempt by the World Health Organi¬ 
zation to find new antimalarials. 

Physicians are pragmatic when choosing 
therapy for patients whose suffering is not al¬ 
leviated by accepted methods. A drug that has 
been shown to be toxicologically safe may be 
utilized in a new area for the flimsiest of rea¬ 
sons. Thus Page (223) described his use of 
quinacrine in two cases of lupus erythemato¬ 
sus as being based on "[a] chance observation 
. .. although he did not describe the obser¬ 
vation that led to his decision. He did, how¬ 
ever, record that quinine had been tried previ¬ 
ously and "prevented extension of the 
lesions," so this may have been the basis for 
his rationale. In any event, the beneficial ef¬ 
fects of quinacrine were remarkable and ap- 
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peared to be related to the degree of yellowing 
of the skin that, as described earlier, is a com¬ 
mon side effect of the use of quinacrine in ma¬ 
laria. 

Among Page's group of patients with lupus 
erythematosus were two with rheumatoid ar¬ 
thritis, whose symptoms also responded to 
treatment with quinacrine. The following 
year, other physicians (224) conducted a trial 
of quinacrine on a larger group of patients 
with rheumatoid arthritis; the results encour¬ 
aged Haydu (225) to test chloroquine on simi¬ 
lar patients, again with positive results. A year 
later, two more physicians (226) compared 
quinacrine with chloroquine and found the 
latter to be better tolerated, the majority of 
patients gaining some benefit. Both quina¬ 
crine and chloroquine caused gastrointestinal 
disturbances, which led to a trial (227) of hy¬ 
droxychloroquine (167), an unsuccessful anti- 
malarial but with less effect on the gut, thus 
allowing larger doses to be given. Hydroxy¬ 
chloroquine has remained part of the standard 
drug therapy for rheumatoid arthritis ever 
since. 



(167) hydroxychloroquine 


So far, the choice of quinine-like drugs to 
treat rheumatoid arthritis has been based on 
preliminary selection as antimalarials. Be¬ 
cause the two types of action are presumably 
unconnected, there might be some value in a 
screening program aimed directly at rheuma¬ 
toid disease. 

8.3 Avermectins and Milbemycins 

There is no major distinction between the 
avermectins and milbemycins, which are 
based on the same complex polyketide macro¬ 
cycle (168): the avermectins are oxygenated at 
C-13 and bear a disaccharide on this oxygen. 
They have been isolated from cultures of a 


number of Streptomyces species, obtained 
from all over the world (228). 

The avermectins, particularly, have been 
the subject of intense commercial interest be¬ 
cause they possess potent activity against both 
nematode and arthropod parasites of livestock 
(229). A full discussion of structure-activity 
relationships would be out of place here, not 
least because the data are voluminous, so we 
shall concentrate on the development of iver¬ 
mectin, which has been a major success. 

Structural designation of avermectins is 
quaintly based on three series: A, B; a, b; and 
1, 2. These are illustrated diagrammatic ally. 
Greater activity resides in the B series, with a 
free OH at position 5. There is little difference 
in potency between the a and b series. In the 
more potent B series there are important dif¬ 
ferences between the 1 series and the 2 series; 
B x is the more active orally, whereas B 2 is the 
more potent by injection. There are also differ¬ 
ences in their spectrum of activity (230). The 
spectrum of activity was kept as broad as pos¬ 
sible by hydrogenation of a mixture of aver¬ 
mectins Bja and Bjb to give ivermectin (169), 
which contains at least 80% of 22,23-dihy- 
droavermectin B-,a and not more than 20% of 
22,23-dihydroavermectin B^b. 

Ivermectin was developed for, and has been 
highly successful in, the treatment and control 
of parasites in cattle, horses, sheep, pigs, and 
dogs. Following studies in humans with river 
blindness (onchocerciasis) (231-233), the de¬ 
velopers of ivermectin (Mectizan)have partic¬ 
ipated in a major program aimed at eradica¬ 
tion of the disease. The sufferers inhabit some 
of the poorest parts of Africa and cannot pay 
for their treatment, so the drug has been do¬ 
nated by Merck and Co. Since 1996 more than 
20 million treatments have been given (234). 
The drug does not kill the adult worms that 
cause onchocerciasis (235), but is useful in in¬ 
terrupting the life cycle (236). Ivermectin is 
also of value in treatment of scabies (237). A 
great deal of information on the biological as¬ 
pects of the use of ivermectin has recently 
been summarized (238). 

9 CONCLUSION 

Natural product research has been the single 
most successful strategy for discovering new 
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avermectins R = 



..miiiQ 


milbemycins R = H 


In the avermectins the series are designated as follows (Y = CH 3 ): 


a,z=ch 3 

B, Z = H 

a, X = CH(CH 3 )CH 2 CH 3 

b, X = CH(CH 3 ) 2 

1, V-W =CH=CH 

2, y-W = CH 2 CH(OH) 


For further details of these descriptors, in the milbemycins, see Ref. 228. 


In ivermectin (169), V-W = CH 2 CH 2 , X = CH(CH 3 )CH 2 CH 3 (major) or 
CH(CH 3 ) 2 (minor), Y = CH 3 and Z = H 


pharmaceuticals and has contributed dramat¬ 
ically to extending human life and improving 
clinical practice. As long as Nature continues 
to yield novel, diverse chemical entities pos¬ 
sessing selective biological activities, natural 
products will play an important role as leads 
for new pharmaceuticals. An interesting re¬ 
cent example is the alkaloid galantamine (Ni¬ 


valin, Reminyl) (170), originally isolated from 
the bulbs of the Amaryllidaceae family (snow¬ 
drops, daffodils, etc.), which has found use in 
the symptomatic treatment of Alzheimer's 
Disease (239). It is a reversible and competi¬ 
tive inhibitor of acetylcholinesterase that also 
interacts allosteric ally with nicotinic acetyl¬ 
choline receptors to potentiate the action of 
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O /Xj/X 
OH 

(169) X = CH(CH 3 )CH 2 CH 3 (major)or CH(CH 3 ) 2 (minor) 


agonists. By acting to enhance the reduced 
central cholinergic function associated with 
this disease, significant improvements in cog¬ 
nition and behavioral symptoms have been ob¬ 
served in patients. In this case it is the alkaloid 
itself that is used as the active compound and 
it will be interesting to see whether develop¬ 
ment leads to better drugs. There are as yet 
relatively few publications in this area, al¬ 
though Sanochemiais interested (240,241). 


OH 



(170) 

Over 90% of bacterial, fungal, and plant 
species are still waiting to be investigated 
(242). High throughput screening methods 
will allow even greater numbers of samples to 


be tested against more biological targets (243, 
244), although this approach sometimes pro¬ 
duces more data than can be conveniently in¬ 
tegrated into a research program. An alterna¬ 
tive view is that the elucidation of the 
biological effects of chosen compounds, in 
some detail, will yield insight into biological 
processes that may open avenues for medici¬ 
nal chemistry research that is not based on 
pure chance. This view is based on the recog¬ 
nition that secondary metabolites have been 
produced and ruthlessly selected, by evolu¬ 
tion, over a long period of time. Either way, 
the medicinal chemist has a wonderful oppor¬ 
tunity to continue utilizing the rich chemical 
diversity offered by nature, as is shown in two 
recent reviews that explore this topic in some 
detail (245,246). 

The best approach for the identification of 
natural product leads is a matter of debate. 
Some very inventive techniques have been 
used in the bioassay-guided method; for exam¬ 
ple, by spraying TLC plates with reactive me¬ 
dia that respond by producing a color change 
in the presence of an active compound. An al¬ 
ternative is to use an ethnobotanical or ethno- 
pharmacologicaltechnique, whereby the accu¬ 
mulated wisdom of many generations of 
native plant users may be harnessed in the 
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search for better medicines for all. These two 
techniques may be combined, so that the na¬ 
tive people describe the uses to which they put 
the plant and the researchers devise a bioas¬ 
say that is used to find the active components. 
The problem with any bioassay-guided tech¬ 
nique, however, is that the inactive constitu¬ 
ents are not identified. This represents a con¬ 
siderable waste, given that the plant has had 
to be collected, preserved, and identified. An 
alternative view is that it is best to extract all 
the constituents, with a view to screening in 
whichever way is appropriate, at that time or 
in the future. With modern high-performance 
liquid chromatography facilities it is possible 
to reduce a plant to its secondary metabolites, 
as single compounds, in a few days: the prod¬ 
ucts are then able to be screened in a high 
throughput manner in an equally short time 
and the compounds can be reevaluated when 
new screens become available. One thing is 
certain: the variety of natural product struc¬ 
tures, after perhaps 300 million years of natu¬ 
ral selection, far exceeds the bounds of human 
imagination, unlike the typical output from 
combinatorial chemistry! 
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Cephamandole ,872,873 
Cephapririn, 871, 874 
Cephradine, 872 
Cerivastatin, 880 
Cetirizine, 783 
Cetirizine dihydrochloride 
chromatographic separation, 
790-791 
cGMP 

molecular property visualiza¬ 
tion, 137 
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CGS 27023 

structure-based design, 444, 
446 

Chain branching alteration ana¬ 
logs, 699-704 
Chapman databases, 387 
Charge-charge interactions, 82 
Charge-coupled devices (CCDs) 
for electron cryomicroscopy, 
623 

for X-ray crystallography, 474 
Charge-dipole interactions, 82 
Charge parameterization, 
101-102 

Charge state determination 
NMR spectroscopy for, 526 
Charge transfer energy, 173 
CHARMM, 298,299,307-308 
in molecular modeling, 118, 
126 

CheD, 387 
ChemBase, 362 
CHEMCATS, 385 
CHEMDBS3D, 260,363 
ChemDraw, 362 
ChemEnlighten, 387 
ChemExplorer, 384 
ChemFinder for Word, 384,388 
ChemFolder, 388 
Chemical Abstracts (CAS) regis¬ 
try file, 50 

Chemical Abstracts Service da¬ 
tabases, 254,361,385 
Chemical business rules, 378, 
403 

Chemical information comput¬ 
ing systems, 357-363 
chemical property estimation 
systems, 388-390 
chemical representation, 
363-373 

databases, 384-388 
data warehouses and data 
marts, 390-393, 402-403 
future developments, 393-397 
glossary cf terms used, 
397-412 

registering chemical informa¬ 
tion, 377-379 

searching chemical structures 1 
reactions, 379-384 
storing chemical information, 
373-377 

Chemical information manage¬ 
ment databases, 384 
Chemical information manage¬ 
ment systems, 384 


Chemical libraries, See Libraries 
Chemical Products Index, 
391-392 

Chemical property estimation 
systems, 388390 
Chemical reactions, 366 
searching, 379-384 
Chemical representation, 
363-373 

Chemical shift, in NMR, 511, 

512 

changes on binding, 536-537 
perturbations as aid in NMR 
screening, 562-568 
Chemical-shift mapping, in 
NMR, 543-545 
Chemical similarity, 382 
Chemical space, 244, 383,400 
exploring with molecular simi¬ 
larity/diversity methods, 
188, 191 

reduction by virtual screening, 
244-245 

Chemical Structure Association, 
360 

Chemical structures 
file conversion, 372-373 
searching, 379-384 
Chemical suppliers searching, 
384 

CHEM-INLO, 360 
Chemlnform, 386 
Cheminformatics, 359,400 
Cheminformatics Glossary, 360 
ChemPort program, 385 
Chemscape, 387 
ChemScore 

consensus scoring, 266 
empirical scoring, 310 

ChemSpace, 199 
ChemText, 362 
ChemWindow, 388 

Chem-X, 60, 111 
ChemX/ChemDiverse 
3D pharmacophores, 195-196, 
206 

optimization approach, 217 
and property-based design, 

234 

Cherry picking, combinatorial 
libraries, 216-217,237 
Chesire, 378,387 
Chicken liver DHFR, QSAR in¬ 
hibition studies, 31-32 
Chilies, capsaicin in, 854 
Chime, 369,371,387 
Chiral auxiliary, 810-813 


Chiral catalysts, 814-820 
Chiral centers, 783-785 
Chiral derivatizingagents, 788 
Chiral flags, 365,366 
Chirality, 781-787,820-821 
asymmetric synthesis, 
804-820 

chromatographic separations, 
787-793 

classical resolution, 793-799 
enzyme-mediated asymmetric 
synthesis, 804-807 
nonclassical resolution, 
799-804 

Chiral pool, 807-810 
Chiral reagent, 813-814 
Chiral stationary phase, 
787-788,790-791 
Chlopromazine, 692 
Chloramphenicol, 870 
molecular modeling, 150 
4-Chloro-l,3-benzenediol 
allergenicity prediction, 834 
Chloromethyl ketones 
protease inhibitors, 761-762 
p-Chlorophenylalanine 

classical resolution by crystal¬ 
lization, 798-799, 800 
Chloroquine, 889,890 
Chlortetracycline, 870 
Cholchicine 

toxicological profile prediction, 
841,842 

Cholecystokinin, 855 

X-ray crystallographic studies, 
484 

Chorismate mutase inhibitors 
transition state analogs, 
753-754 

Chromatographic separation 
of chiral molecules, 787-793 
Chromobacterium violactum, 

873 

Chromosomes 
in genetic algorithms, 87 
Chymotrypsin inhibitors 
affinity labels, 761, 762 
molecular modeling, 118 
QSAR studies, 5, 35-36 
CICLOPS, 223 
Cilazapril 

asymmetric synthesis, 807, 

809 

Cilofungin, 877 

Cinchona bark, quinine from, 

888 
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CIP (Cahn-Ingold-Prelog) stere¬ 
ochemistry, 365,400 
Cisplatin 

vindesine with, 860 
cisftrans stereochemistry, 399 
Clarithromycin, 849,874, 
875-876 

Classical resolution, of chiral 
molecules, 793-799 
Clavulanic acid, 718,869,870 
Cleaning and transforming data, 
400 

Clenbuterol 

chromatographic separation, 
787,788 

Client-server architecture, 
400-401 

Clipping, 378,401 
Clique search techniques, 262 
CLOB (Character Large Object) 
data type, 401 
Clofibric acid 

antisickling agent, 421, 422 
CLOGP, 18,389 
ClogP, 17-18, 36 
Cloning, 127 

cDNA clone libraries, 341-342 
Clotrimazole, 717 
Clustering methods, 379,401 
for combinatorial library de¬ 
sign, 220 

in molecular modeling, 90-91 
with molecular similarity/di¬ 
versity methods, 205 
CML (Chemical Markup Lan¬ 
guage), 371-372, 401, 405, 
412 

CNS drugs 

complementarity, 134 
natural products as leads, 
849-856 

pharmacophore point filters, 
250 

polar surface area, 245 
CNS program, 478 
Coagulation factor 2 
X-ray crystallographic studies, 

484- 485 

Coagulation factor 7 
X-ray crystallographic studies, 

485 

Coagulation factor 7a 
X-ray crystallographic studies, 

485- 486 

Coagulation factor 9 
X-ray crystallographic studies, 

486 


Coagulation factor 10 
X-ray crystallographic studies, 
484 

COBRA, 255 
R-Cocaine 

dopamine transporter inhibi¬ 
tor, 268 

Codeine, 849,850 
Coformycin, 750-752 
Cognex 

structure-based design, 449 
Colforsin daropate, 849 

Collagenase 

NMR binding studies, 555, 

556 

target of structure-based drug 
design, 443 
CombiBUILD, 227 
CombiChem Package, 386 
CombiDOCK, 217,227 
combinatorial docking, 318 
CombiLibMaker, 378,387 
Combinatorial chemistry, 283, 
358,591-592 
defined, 401 

and molecular modeling, 155 
and natural product screen¬ 
ing, 848 

Combinatorial chemistry data¬ 
bases, 387 

Combinatorial docking, 317318 
Combinatorial libraries, 214 
comparisons, 221-223 
design for molecular similarity 
methods, 190,214-228 
encoding and identification 
with mass spectrometry, 
596-597 

integration, 224-225 
LC-MS purification, 592-594 
optimization, 217-221 
peptidomimetics, 657 
screening for ligands to two 
receptors simultaneously, 
601-602 

structure-based design, 
225-228 

structure/purity confirmation 

with mass spectrometry, 
594-596 

with virtual screening, 317 

S,S-Combretadioxolane, 816, 

819 

Combretastatin A-4,816,819 
CoMFA, See Comparative molec¬ 
ular field analysis 
Compactin, 744,879 


Comparative binding energy 
analysis (COMBINE), 53 
and docking methods, 

304-305 

Comparative molecular field 
analysis (CoMFA), 53-54 
assessment of predictability, 
151-153 
3D,58-60 

and docking methods, 304 
field mapping, 107 
molecular field descriptors, 
56-57 

and molecular modeling, 138, 
147 

Comparative quantitative struc¬ 
ture-activity relationships 
database development, 39 
database mining for models, 
39-41 

Competitive inhibitors, 728-729 
Complementarity, 134 
Comprehensive Medicinal 
Chemistry database, 379, 
386 

Computational Chemistry List, 
360 

Computational protein-ligand 
docking techniques, 

262-264 

Computing technologies, 
334-335, 337. See also 
Chemical information com¬ 
puting systems 
COMSiA, 53, 60 
CONCORD, 363,366,387,401 
3D coordinate generation, 267 
3D descriptors, 55,110 
virtual screening application, 
254 

Concordance, 390,401 
Conformational analysis 
in molecular modeling, 87, 
93-94 

NMR spectroscopy for, 
525-526 

and systematic search, 89-93 
Conformational clustering, 
92-93 

Conformational flexibility, 288 
Conformationally restricted ana¬ 
logs, 694-699 

Conformationally restricted pep¬ 
tides, 636-643 
Conformational mimicry, 
140-142 
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Conformational mimicry index, 
142 

Conglomerate racemates, 
799-800,801,802-803 
Connection tables, 365-368,371, 
401 

file conversion, 372373 
ISIS database, 376 
Connectivity, See Molecular con¬ 
nectivity 
co -Conotoxins 
lead for drugs, 851-852 
NMR spectroscopy, 518-523 
ConQuest search program, 387 
Conscore constraint score, 218 
Consensus scoring, 265-266, 
291,319-320 
and molecular modeling, 
117-118 

Consistent force fields, 102 
Constrained minimization, 
143-144 

Contact matrix, 125-127 
Contignasterol, 886 
CONTRAST, 361 
Conus magus, conotoxins from, 
851 

Convertases 
homology modeling, 123 
CONVERTER, 366,402 
CoQSAR, See Comparative 

quantitative structure-activ¬ 
ity relationships 
CORINA, 366,402 
3D coordinate generation, 267 
3D descriptors, 55 
virtual screening application, 
254 

Cosine coefficient, 68 
COSMIC force field, 80 
Coulomb's law, 80, 82, 285 
and dielectric problem, 8 3 
Coumarin, 882 
Counting schemes 
in druglikeness screening, 
245-246 

Coupling constants, in NMR, 
511,512 

changes on binding, 536-537 
for conformational analysis, 525 
COUSIN, 361,373,387 
and combinatorial library in¬ 
tegration, 224 
Covalent bonds, 6,170 
Covalently binding enzyme in¬ 
hibitors, 720,754-756 
inactivation of, 756-760 


Cox-1 inhibitors, 718 
X-ray crystallographic studies, 
486 

COX-2 inhibitors, 718 
mass-spectrometric binding 
assay screening, 604 
seeding experiments, 319 
X-ray crystallographic studies, 
486 

CP-96,345, 670,672 
C-QSAR database, 39 

Crambin 

molecular modeling, 124 

Crixivan 

structure-based design, 
438-439 

CROSS-BOW, 361 
Crossfire Beilstein, 385 
Cross-linked enzyme crystals, 
804 

Crosslinking agents, 424-425 
Cross validation, 57, 64 
Cryoprobes 

in NMR screening, 577 
in NMR spectroscopy, 515 
Cryptotheca cripta, 867 
Crystallization 
for asymmetric transforma¬ 
tion of enantiomers, 

798- 799 

for enhancing chromato¬ 
graphic separation of enan¬ 
tiomers, 792-793 
in nonclassical resolution, 

799- 804 
CScore, 117 
Curare 

lead for drugs, 856-858 
Cyclic lactams 
conformationally restricted 
peptidomimetics, 640-642 
Cyclic protease inhibitors, 636 
Cyclin-dependent kinase 2 
(CDK2) 

H717 inhibitor pharmaco¬ 
phore, 253 
Cyclo(Gly 6 ) 

genetic algorithm exploration 
of conformational space, 88 
Cycloheptadecane 
potential smoothing study, 86 
Cyclooxygenase 1/2 inhibitors, 
See COX-1 inhibitors; 
COX-2 inhibitors 
Cyclophilin, 552 
D-Cycloserine, 717, 719 


Cyclosporin, 848 
molecular modeling, 106 
NMR spectroscopic binding 
studies, 539 
Cyclosporin A 
binding to FKBP, 552-553 
y -Cystathionase inhibitors, 
719-720 
Cysteine 

chemical modification re¬ 
agents, 755 

Cysteine peptidase inhibitors 
transition state analogs, 
652-655 

Cysteine protease inhibitors 
affinity labels, 762 
Cytochrome P450 
homology modeling, 123 
Cytochrome P450 reductase 
X-ray crystallographic studies, 
486 

Cytosine arabinoside, 717, 
867-868 

D2163, 804, 806 

Daemon, 392,402 
Daffodils, drugs derived from, 
892 

Dalfopristin, 876-877 
Spiro- DAMP, 696 
4-DAMP 

semirigid analogs, 695-696 
Daptomycin, 848 
DARWIN, 299 

explicit water molecules, 303 
Databases 

for bioinformatics, 345349 
cDNA microarray chips, 345 
chemical information manage¬ 
ment, 384 

commercial systems for drug¬ 
sized molecules, 3 843 87 
comparative QSAR, 39-41 
comparing expressed sequence 
tags with, 342 
history of, 360-363 
knowledge discovery in, 
393395 

natural products, 387,597 
for pharmacophore screening, 
254-255 

proprietary and academic, 

387-388 

sequence and 3D structure, 
387 

storing chemical information 
in, 373-377 
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for X-ray crystallography, 
478-479 

Database tier, 393,403,407 
Data cartridges, 395,402 
Data compression, 402 
Data dictionary table, 375 
Data marts, 391-393, 402 
Data mining, 402,410,411,412 
future prospects, 394-395 
with QSAR, 66-67 
Data warehouses, 390-393, 
402-403 

Dative bonds, 170,365 
Daunomycin 

thermodynamics cf binding to 
DNA, 183 

DayCart, 386 
Day CGI, 386 

Daylight Chemical Information 
Systems databases 
descriptors, 192 
in virtual screening, 254 
10-Deacetylbaccatinl 1,803 
Decamethonium, 58 
fragment analogs, 708-710 
lead for drugs, 856-858 
Deceptive fitness functions, 88 
Decision support systems, 403 
Decision tree approach, 247-248 
Deconvolution, 401 
Deduplication, 378,403 
Dehydroalantolactone 
allergenicity prediction, 836 
Demexiptiline, 692-693 
DENDRAL, 393 
De novo design, 113 
ff?j-Deoxycoformycin , 750-752 
( S>Deoxycoformycin, 751 
DEREK, 246 

Derwent Information databases, 
386 

Derwent Selection database, 386 
Derwent World Drug Index 
(WDI), 379, 386, 387 
Derwent World Patents Index 
(WPI), 386 

10-Desacetylbaccatin 111,863 
Descriptor pharmacophores, 
60-63 

Design in Receptor (DiR) ap¬ 
proach, 236 

DHFR, See Dihydrofolate reduc¬ 
tase 

Diamino, 5Y, 6-Z-quinazolines 
QSAR studies cf DHFR inhibi¬ 
tion, 34-35 


Diastereomers, 784 
chromatographic separation, 
788 

Dice coefficient, 68 
Dicoumarol, 882 
Dictionary of Natural Products, 
597 ' 

Dideoxyinosine, 717 
Dielectric problem, 83-84 
Dienestrol, 706-707 
Diethylstilbestrol 
stereoisomer analogs, 706-707 
Diffusion-filtered NMR screen¬ 
ing, 570-571 

a-Difluoromethylornithine, 717 
Digital Northern, 342 
Dihydroartemisinin, 887 
Dihydrofolate reductase inhibi¬ 
tors, 545, 717 
chemical-shift mapping of 
binding, 545 

comparative molecular field 
analysis, 153 

genetic algorithm study cf ac¬ 
tive site, 89 

genetic algorithm study of 
docking, 88-89 
interaction with methotrex¬ 
ate, 120 

interaction with tri¬ 
methoprim, 151,183 
interaction with trimetrexate, 
531,557459 

mass-spectrometric binding 
assay screening, 604 
molecular modeling, 114, 115, 
116,147,151 
QSAR studies, 5 
QSAR studies cf inhibition by 
diamino, 5Y, 6-Z-quinazo- 
lines, 34-35 

QSAR studies cf inhibition by 
diamino-5X-benzyl pyrimi¬ 
dines, 39 

QSAR studies of inhibition by 
triazines, 31-33 
target of structure-based drug 
design, 425-426 
volume mapping, 140 
X-ray crystallographic studies, 
486 

Dihydromuscimol, 690 
Dihydroorotase 
transition state analogs, 752 
Dihy droorotate dehydrogenase 
inhibitors 


X-ray crystallographic studies, 
486 

Dihydropteroate synthetase in¬ 
hibitors, 717 

X-ray crystallographic studies, 
486 

1,4-Dihydropyridines 

chromatographic separation, 
788,789 

Dihydroquinine, 889 
Dihydrotestosterone, 36, 768 
Diller-Merz rapid docking ap¬ 
proach, 292,295 
assessment, 303 
combinatorial docking, 317 
Diltiazem 

nonclassical resolution, 803, 
805 

Dimension tables, 390,403 
N,N-Dimethyldopamine 

alkyl chain homologation ana¬ 
logs, 701 

bioisosteric analogs, 690, 692 
semirigid analogs, 695 
AgaiOa-DimethylheptylTHC^ 852 
Dimethyl sulfoxide (DMSO) 
force field models for, 176 
Dimethyltubocurarine, 857 
Diphenylmethane, 231 
privileged structures, 252 
2,3-Diphosphoglycerate 
(2,3-DPG), 104,421 
2,3-Diphosphoglycerate 

(2,3-DPG) analogs, 103,104 
Dipolar electrostatic forces, 172 
Dipole-dipole interactions, 6, 82 
Dipole-induced dipole interac¬ 
tions, 173 

Directed tweak algorithm, 260 
Directionality, 140 
DISCO, 58, 60, 256 
and molecular modeling, 147 
Discodermolide 
genotoxicity prediction, 843 
Disintegrins, 652 
Disoxaril 

structure-based design, 
454-455 

Dispersive interactions, 82,174. 
See also van der Waals 
forces 

Dissimilarity approaches, 
189-190,206-208 
Dissociation constant, 286 
Distamycin 

binding perturbations, 544 
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Distance geometry methods 
in molecular modeling, 126, 
142,147 
in QSAR, 60 
in virtual screening, 263 
Distance geometry QSAR tech¬ 
nique, 53 

Distance matrix, 135 
Distance measures 
molecular modeling, 135-137 
molecular similarity/diversity 
methods, 201-202 
Distance range matrix, 135-136 
Dithromycin, 875 
DiverseSolutions, 387 
for molecular similarity/diver¬ 
sity methods, 193-194, 
203-204 

Diversity analysis, 358 
Diversity methods. See Molecu¬ 
lar similarityldiversity 
methods 

Diversity-property derived 
(DPD) method, 201,203 
Dixon plots, 731-732 
D,L descriptors, for chiral mole¬ 
cules, 783 
DMB-323 

NMR binding studies with 
HIV protease, 560-562 
DMHB 

structure-based design, 424 
DMP 450,659 
structure-based design, 
438-439 
DNA 

molecular modeling, 154 
NMR structural determina¬ 
tion, 535 

noncovalent bonds in, 170 
supercoiling modeling, 95 
synthesis inhibition by phe¬ 
nols, 40 

DNA-binding drugs 

chemical-shift mapping of 
binding to, 544-545 
molecular modeling, 116 
NMR spectroscopy, 547-552 
thermodynamics of binding, 
183 

DNA gyrase inhibitors 
novel lead identification, 321 
DNA helicase pcra 
X-ray crystallographic studies, 
487 

DNA polymerase inhibitors, 342, 
717 


DNA topoisomerase 1 
X-ray crystallographic studies, 
487 

Docetaxel, 849,863 
DOCK 

anchor and grow algorithm, 
296 

assessment, 303,304 
combinatorial docking, 318 
consensus scoring, 266 
empirical scoring, 310 
force field-based scoring, 308 
force-field scoring, 264 
geometric/combinatorial 
search, 295 
ligand handling, 293 
molecular modeling, 112, 113, 
115,116 

molecular modeling of small 
cavity, 106,107 
penalty terms, 313 
performance in structure pre¬ 
diction, 314 

protein and receptor model¬ 
ing, 267 

protein flexibility, 301 
receptor representation in, 

291 

rigid docking, 262-263 

sampling/scoring methods 
used, 261 

seeding experiments, 319 
with site-based pharmaco¬ 
phores, 236 
DOCK4.0 
PMF scoring, 265 
weak inhibitors, 319 
DockCrunch project, 317 
Docking methods. See also Scor¬ 
ing functions; various dock¬ 
ing programs; Virtual 
screening 

assessment, 303-304 
basic concepts, 289-290 
combinatorial, 317-318 
flexible ligands, 293-294,322 
and homology modeling, 
305-306 

and molecular modeling, 
113-118 

and molecular size, 312313 
NOE docking in NMR, 
545-546 

penalty terms, 313 
protein flexibility, 300-302, 
322 


protein-ligand docking soft¬ 
ware, 261 

and QSAR, 304-305 
searching configuration and 
conformation space, 

294-300 

seeding experiments, 318-319 
special aspects, 300-306 
in structure-based virtual 
screening, 260-267 
as virtual screening tool, 
266-267 

water's role, 302-303, 

313-314 

Docking problem, 289 
DockIT, 261 
DockVision, 261 

Dolabella auricularia, 868 
Dolastatin-10, 868,869 
DoMCoSAR approach, 305 
Donepezil 

structure-based design, 449 
L-Dopa, 785 
analogs, 690 

Dopamine 

semirigid analogs, 697 
Dopamine-transporter inhibitors 
pharmacophore model, 256, 
258 

virtual screening, 267-269, 

270 

D-Optimal designs, 65-66 
Dose-response curves, 8 
Dothiepin, 692-693 
Doxepin, 692-693 
DragHome method, 305 
DRAGON, 388-389 
DREAM++, 318 
Drill-down, 391,403 
Dronabinol, 849 
Drug databases, 385-386. See 
also Databases 
Drug Data Report, 379,386 
Drug development, 509-510 
serial design costs, 359 
Druglikeness screening. See also 
Lipinski's " rule of 5 " 
molecular similarity/diversity 
methods, 191 
similarity searching, 383 
virtual screening, 245-250 
Drug-receptor complexes, 
170-179 

low energy state of, 5 
Drug resistance 
antibiotic resistant pathogens, 
770 
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essential pathways versus sin¬ 
gle enzyme inhibitor, 495 
DrugScore function, 311,312 
assessment. 303 
performance in structure pre¬ 
diction, 314 

seeding experiments, 319 
in virtual screening, 315 
Drug screening, See Screening 
Drug-target binding forces, 
170-171 

association thermodynamics, 
170-171,177-179 
energy components for inter- 
molecular noncovalent in¬ 
teractions, 171-174 
example drug-receptor inter¬ 
actions, 181-183 
free energy calculation, 
180-181 

molecular mechanics force 
fields, 174-177 
Drug targets 

and bioinformatics, 351-352 
estimated number of, 50 
x-ray crystallography cf pub¬ 
lished structures, 482-493 

DYLOMMS, 107 

E. Coli 

mutagenicity prediction, 829 
Eadie-Hofstee plot, 727, 729, 731 
ECEPP force field, 118 
Echinocandins, 877-878 
Ecteinascidia turbinata, 868 
Ecteinascidin-743,848,867-868 
ECTL (Extracting, Cleaning, 
Transforming, and Loading) 
data, 377-379,403 
Edman sequencing, 518 
Edrophonium, 5 8 
Efaproxaril (RSR-13), 422 
Eflornithine, 768, 769 
Eigenvector following method, 
292,301 

Einstein-Sutherland equation, 24 
Elan, 387 

Electron cryomicroscopy, 
611-628 

image processing and 3D re¬ 
construction, 624-628 
image selection and prepro¬ 
cessing, 623-624 
three-dimensional, 615-616 
Electron-donating substituents, 
12-15 

growth inhibition by, 41 


Electronic parameters 
in QSAR, 11-15, 50 
Electron lenses, 612 
Electron probability distribu¬ 
tion, 101 

Electron-topological matrix cf 
congruence, 147 
Electron-withdrawing substitu¬ 
ents, 11-15 

growth inhibition by, 41 
Electrospray FTICR mass spec¬ 
trometry, 601-603 
Electrostatic interactions, 
171-172,285 
charge parameterization, 
101-102 

and docking scoring, 308 
enzyme inhibitors, 721, 723 
long range, 177 
molecular modeling, 81-85, 
108-110,140 

and molecular property visual¬ 
ization, 137 
and QSAR, 6-7, 52 
Electrotopological indices, 4 
Elimination algorithms, 207 
EMBL Nucleotide Sequence Da¬ 
tabase, 335 

Embryo tail defects, 40 
EMD 122946,676 
Empirical scoring, 264, 307, 
308-310 

Enalapril, 650,747,881 
asymmetric synthesis, 807, 

809 

conformationally restricted 
peptidomimetics, 640-641 
Enalaprilat, 650, 747 
conformationally restricted 
peptidomimetics, 640-641 
Enantiomeric excess, 784 
enrichment by crystallization, 
800-802 

Enantiomers, 365, 366. See also 
Chirality 

with agonist-antagonist prop¬ 
erties at same receptor, 705 
chromatographic separations, 
787-793 

defined, 783-785 
Enantioselective metabolism, 
786-787 

Enantioselectivity, 784 
Encoding 

and genetic algorithm, 88 
natural products with mass 
spectrometry, 596-597 


Encryption, 403 
Endorphins, 634, 850-851 
model receptor site, 149 
Endothelin 

antagonists, 211, 672-674, 
675,676 

conformationally restricted 
peptidomimetics, 637, 639 
NMR spectroscopy, 523-524, 
526-527 

ENERGI approach, 127 
Energy driven/stochastic search 
strategies, 292,296-2300 
Energy cf association, 177 
English yew, paclitaxel from, 
861-862 

Enkephalins, 634,850-851 
conformationally restricted 
peptidomimetics, 129, 637, 
639 

model receptor site, 149 
Ensemble, 94 
Enthalpy of association 
drug-receptor complexes, 
170-171 
Entoviruses 

target of structure-based drug 
design, 454-456 
Entrainment, 802 
Entropy, 94 
Entropy cf association 
drug-receptor complexes, 
170-171 

Enumerated structure, 368 
Enumeration, 401,403 
Enzyme-induced inactivators, 
756 

Enzyme-inhibitor complexes, 
721-722 

Enzyme inhibitors, 715-720. See 
also specific Enzymes 
affinity labels, 756-759, 
760-764 

design of covalently binding, 
720, 754-756 

design of noncovalently bind¬ 
ing, 720-754 
examples used in disease 
treatment, 717 
ground-state analogs, 720, 

740- 741 

inactivation of covalently 
binding, 756-760 
mechanism-based, 759-760, 
764-771 

multi substrate analogs, 720, 

741- 748 
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Enzyme inhibitors (Continued) 
pseudoirreversible, 771-774 
rapid, reversible, 720, 

728-734 

slow-, tight-, and slow-tight- 

binding, 120, 734-740,749 
transition-state analogs, 120, 
748-754 

Enzyme-mediated asymmetric 
synthesis, 804-807 
Enzymes 
as drug targets, 5 
kinetics, 725-728 
pathways and inhibitor de¬ 
sign, 495 

and structural genomics, 352 
Ephedra , 885 
Ephedrine, 884-886 
(-)-Epibatidine, 819-820,821 
Epothilone A, 864 
toxicological profile prediction, 
838-839 

Epothilone B, 864-865 
Epothilone D, 865 
Epoxides 

filtering from virtual screens, 

246 

Equivalence class, 403 
Erythro-9-(2-hydroxy-3-nonyl) 

(EHNA) 

high-affinity adenosine deami¬ 
nase ligand, 604 

Erythromycin, 870, 871 
Erythromycin macrolides, 

874-876 

Erythro- prefix, 784 
Erythrose 
enantiomers, 784 
E s constant (Taft), 23-24 
E-selectin 

NMR screening binding stud¬ 
ies, 572 

E-State index, 26, 54 
Esters 

pharmacophore points, 249 
Estradiol, 706,771 
Estrogen receptor 1 a 
X-ray crystallographic studies, 

487 

Estrogen receptors 
mass-spectrometric binding 
assay screening, 604 
Ethacrynic acid 
antisickling agent, 421 
Ethidium bromide 
thermodynamics of binding to 
DNA, 183 


Etodolac 

classical resolution, 796-797 
Etoposide, 717, 867 
Etorphine, 851 
Euclidean distance, 68,202 
EUDOC 
assessment, 303 
ligand handling, 293 
European Bioinformatics Insti¬ 
tute (EBI), 335 
sequence databases, 387 
Everolimus, 849 
Evolutionary algorithms, 299 
with QSAR, 53-54, 61 
Exact match search, 378, 
379-381,403 
Exchange repulsion energy, 
172-173 

Exemestane, 110,111 
Exhaustive mapping, 398 
Expert Protein Analysis System, 

335 

Expressed sequence tags, 338 
expression level significance, 

342-344 

profiling, 341-342 

Expression analysis/profiling, 

334 

genome-wide, 344345 
for target discovery, 340-345 
Extended stereochemistry, 365, 

404 

External registry number, 379, 
404 

Extrathermodynamic relation¬ 
ship, 26 

E,Z system, 365,399 

Factorial designs, 65-66 
Factor Xa inhibitors, 103,738 
3D pharmacophores, 199 
non-peptide peptidomimetics, 

662.665 

site-based pharmacophores, 

235-236 

target of structure-based drug 
design, 442 

Fact tables, 390, 401, 404 
Failed Reactions database, 385 
Families, 93 

Family competition evolutionary 
algorithm, 299 
FASTA, 347 

Fast ion bombardment, 586,587 
Fastsearch index, 376377,399, 
404 

FBSS, 202 


FeatureTrees, 316,321 

Fibonacci search method, 11 
Fibrinogen 

virtual screening studies, 

212-213 

Field-based descriptors, 201 
Field effects, 140 
Field mapping, 107 
Fields, 404 
Filtercascade, 267 
Filters, for searching, 315-316, 
376,380,392,404 
Finasteride, 717,768-770 
Fingerprint Generation Pack, 

388 

Fingerprints, 376,378,399, 404 
molecular similarity methods, 

188 

FIRM, 67 

Fitness functions, 87-88 

FK506 

binding to FKBP, 552-555 
NMR spectroscopic binding 
studies, 539 

FK506 binding protein inhibi¬ 
tors, 552-555 
de novo design, 113 
flexible docking studies, 265 
hydrogen bonding in, 288 
target of NMR screening stud¬ 
ies, 565-566, 571 
weak inhibitor screening, 319 
X-ray crystallographic studies; 
487 

Flat database storage, 362-363, 

404 

Flat file storage, 360-362,404 
Flecainide 
enantiomers, 786 
FlexE, 301 
Flexibase/FLOG, 263 
Flexible ligands 
in docking methods, 293-294, 
322 

and geometric/combinatorial 

search, 295 

in virtual screening, 263-264 
Flexmatch index, 376 
Flexmatch search, 404 

FlexS, 316 

novel lead identification, 320, 
321 

FlexX 

assessment, 303, 304 
consensus scoring, 266,320 
empirical scoring, 264,310 
explicit water molecules, 302 
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hydrogen bonding, 319 
incremental construction, 
295-296 

molecular modeling, 115 
novel lead identification, 320 
performance in structure pre¬ 
diction, 314 

protein and receptor model¬ 
ing, 267 

receptor representation in, 

291 

sampling/scoring methods 
used, 261 

seeding experiments, 319 
FlexX c extension, 318 
Flickering cluster model, of hy¬ 
drophobic interactions, 15 
Ho, 256 

Flobufen, 41-42, 42 
FLOG 

explicit water molecules, 303 
ligand handling, 293 
and molecular size, 312313 
seeding experiments, 318 

Flunet 

structure-based design, 451 
Fluorescence spectroscopy, 592 

4- Fluorobenzenesulfonamide 
binding to carbonic anhy- 

drase, 538 

5' -p-Fluorosulfonylbenzoyl 

adenosine (5'-FSBA), 763- 
764 

5- Fluorouracil, 717, 718 
Flurbiprofen, 763 
Fluvastatin, 744,879-880 
FOCUS-2D method, 68-69 
Fold compatibility methods, 353 
Fold patterns 

limited number of, 353 
Follicle stimulating hormone 
X-ray crystallographic studies, 
488 

Force fields 

drug-target binding forces, 
170-183 

molecular modeling, 79-81 
parameter derivation, 102-103 
Force-field scoring, 264, 

306-308 

Formestane, 770, 771 
Formula table, 376 
FOUNDATION, 112-113 
Fourier transform ion cyclotron 
resonance (FTICR) mass 
spectrometry, 585,601-603 


4- Point pharmacophores, 408 
molecular similarity methods, 

189,196-198,205 
privileged, 231 
virtual screening, 210 
FPL-67047 

structure-based design, 453 
Fractional factorial designs, 66 
Fragment analogs, 707-710 
Fragment-based ligand docking, 
294 
FRED 

ligand handling, 293 
sampling/scoring methods 
used, 261 

Free energy of association, 286 
calculating, 180-181 
drug-receptor complexes, 5, 
170-171 

enzyme-inhibitor complexes, 
722 

Free energy perturbation, 307, 
308 

Free-Wilson approach, in QSAR, 

4, 29-30 

Frontal affinity chromatogra¬ 
phy-mass spectrometry, 601 

5- FSA 

structure-based design, 424 
FTDOCK, 115 
Ftrees-FS algorithm, 221 
Fujita-Ban equation, 4 
Fujita-Nishioka analysis, 13 
Functional genomics, 338-340 
Functional group filters 
in druglikeness screening, 
246-247 

Functional mimetics (peptidomi- 
metics), 636 

Fungal natural products, 848, 
893 

Fungal squalene epoxidase in¬ 
hibitors, 717 

Fungal sterol 14a-demethylase 
inhibitors, 717 

Fuzzy bipolar pharmacophore 
autocorrelograms, 197 
Fuzzy clustering technique 
with molecular similarity/di¬ 
versity methods, 205 
Fuzzy distance, 57-58 
Fuzzy searches, 376 

G-4120,663 

GABA, See y -Aminobutyric acid 


GABA aminotransferase 

(GABA-T) inhibitors, 718, 

766-767 

X-ray crystallographic studies, 
488 

/3-D-Galactoside 

saturation transfer difference 
in binding to agglutin I, 569 
Galantamine (galanthamine), 
848,849,892-893 
nonclassical resolution, 802, 
803 

GALOPED, 218 
Gambler 

consensus scoring, 266 
flexible ligands, 263 
seeding experiments, 319 
Gas chromatography, 592 
for enantiomer separation, 

787 

Gas chromatography-mass spec¬ 
trometry (GC-MS), 585-586 
GASP (Genetic Algorithm Simi¬ 
larity Program), 256 
in molecular modeling, 147 
Gas phase association, 177, 178 
G-CSF 3 

X-ray crystallographic studies, 
488 

Gelatinase 

NMR binding studies, 555 
Gel permeation chromatogra- 
phy-mass spectrometry, 599 
GenBank, 335 
growth of, 339 

X-ray crystallography applica¬ 
tion, 481 
GeneChips, 344 
Gene expression, 351. See also 
Expressed sequence tags; 
Expression analysis/profil¬ 
ing 

Gene family approaches, 188, 

244 

subset selection, 190-191 
Gene family databases, 347-349 
Gene nomenclature, 337 
Gene Ontology project, 337 
Generic structures, 367, 368, 
404-405 

Geneseq database, 346 
Genetic algorithms 
and combinatorial library de¬ 
sign, 217,218 

with docking methods, 292, 
298-299 
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Genetic algorithms (Continued) 
with FOCUS-2D method, 
68-69 

inverse folding and threading, 
124-125 
Lamarckian, 299 
in molecular modeling, 87-89, 
117 

with QSAR, 53, 61 
in virtual screening, 263 
GeneTox, 829 

Genie Control Language, 378 
Genitoxants, 840 
Genome annotation, 481,494 
Genome-wide expression analy¬ 
sis, 344345 

Genomics, See Functional 

genomics; Phylogenomics; 
Structural genomics 
GEOCORE, 124 
Geometric atom-pair descrip¬ 
tors, 210 

Geometric/combinatorial search 
strategies, 292,295 
Geometric hashing, 262 
Geometric isomer analogs, 
704-707 

Ghost membranes 
EPR signal changes by ROH, 
27 

Ghrelin, 671, 674 
GHRP-6,671,674 
Gibbs-Helmholtz equation, 286 
Gigabyte, 405 
GLIDE 

sampling/scoring methods 
used, 261 

Global stereochemistry, 398 
Glucocorticoid receptor 
X-ray crystallographic studies, 
488 

Glucose, 784 

Glutamate dehydrogenase, 764 
Glutamate NDMA agonists, 150 
Glutamate NDMA antagonists, 
150 

Glutamate receptor 1 
X-ray crystallographic studies, 
488 

Glutamic acid 
chemical modification re¬ 
agents, 755 

nonclassical bioisostere ana¬ 
log, 694 

rigid analogs, 699 
Glutamine-PRPP amidotrans- 
ferase inhibitors, 717 


Glutathione peroxidase 
X-ray crystallographic studies, 
488 

Glycinamide ribonucleotide 
formyltransferase inhibi¬ 
tors, 742-743 

target of structure-based drug 
design, 429-432 
GlycophorinA 
potential smoothing study of 
TM helix dimer, 86 
Glycoprotein Ilb/IIIa (GpIIb/ 
Ilia) inhibitors 

non-peptide peptidomimetics, 
662-665 

template mimetics, 129, 643, 
645 

Glycopyrrolate 
stereoisomers, 784-785 
Gmelin database, 386 
GOLD 

assessment, 303, 304 
empirical scoring, 309 
flexible ligands, 263 
genetic algorithm, 299 
protein flexibility, 300 
sampling/seoring methods 
used, 261 
GOLEM, 67 
GOLPE, 54, 60 
GPCR libraries 
3D pharmacophore finger¬ 
prints for, 205 
GPCR-likeness, 251,252 
GPCRs (G-protein-coupledre- 
ceptors), 668 

focused screening libraries 
targeting, 209,250 
homology modeling, 123, 150 
molecular modeling, 122 
peptidomimetics, 644, 677 
7-transmembrane, 229-234 
GRAB-peptidomimetic (Group- 
Replacement Assisted Bind¬ 
ing), 636,658-659,677 
GRAMM (Global Range Molecu¬ 
lar Matching), 115 
Granulocyte-macrophage CSF 
X-ray crystallographic studies, 
488 

Graphical representation, 371 
Graph isomorphism problem, 
380,405 
GREEN 

force-field scoring, 264 
GRID, 58,315 
3D pharmacophores, 198 


empirical scoring, 309 
explicit water molecules, 303 
hydrogen bonding, 107 
and molecular modeling, 138 
Gridding and Partitioning (GaP) 
approach, 199,200 
GRID/GOLPE analysis, 304-305 
Grid tyranny, 91,144 
GRIND, 60 

Groove-binding ligands, 5 
Ground-state analog enzyme 
inhibitors, 720, 740-741 
Growth hormone receptor 
X-ray crystallographic studies, 
488 

Growth hormone secretagogues, 
671-672,675 
GS 4071 

structure-based design, 452 
Guanidine 

pharmacophore points, 249 
Gusperimus, 849 

Hall databases, 387 
Haloperidol 

HIV protease inhibitor, 111, 
112 

Halopy rimidine s 
filtering from virtual screens, 
246 

Hammett constants, 11, 50 
Hammett equation, 12, 13, 26, 

50 

Hamming distance, 202 
Hammond postulate, 748 
Hanes-Woolf plot, 727, 729, 731 
Hansch approach, to QSAR, 

26-27, 30 

Hansch-Fujita-Ban analysis, 31 
Hansch parabolic equation, 3 
Hansch-type parameters, 54 
HARPick program, 218,221 
Hash code, 376,380,405 
and combinatorial library de¬ 
sign, 223 

molecular fragment based, 54 
Hemicholinium 
interatomic distance analogs, 
710-711 
Hemoglobin 

molecular modeling of crystal 
structure, 105,107 
target of structure-based drug 
design, 419-425 
Hepatocyte growth factor activa¬ 
tor inhibitor 1 

mariptase inhibitor, 269,271 



Index 


915 


Heroin, 849-850 
HE-State index, 26 
Heterochiral molecules, 782 
Heteronuclear multiple bond 
correlation spectroscopy 
natural products, 518 
Heteronuclear single quantum 
correlation spectroscopy, 
512 

Hexestrol, 707 
Hierarchical clustering, 220, 

401.405 

with molecular similarity/di¬ 
versity methods, 205 
High density oligonucleotide ar¬ 
rays, 344 

High performance liquid chro¬ 
matography (HPLC), 

586-589 

for combinatorial library 
screening, 592-596,598, 
599,607 
fast, 596 

for hydrophobicity determina¬ 
tion, 16-17, 23 
for separation of chiral mole¬ 
cules, 783,788-792 
High-throughput chemistry, 

358.405 

chemical libraries for, 367 
and natural product screen¬ 
ing, 848 

High Throughput Crystallogra¬ 
phy Consortium, 418 
High-throughput screening, 283 
mass spectrometry applica¬ 
tions, 591,592-596 
and molecular modeling, 155 
molecular similarity/diversity 
methods, 191 

raw data points obtained by 
companies, 50 
with virtual screening, 316 
X-ray crystallography applica¬ 
tion, 472 

HIN file format, 369 
HINT descriptors, 56 
Hint!-LogP, 389 
HipHop, 60,256 
Histamine antagonists 
molecular modeling, 143 
Histidine 

chemical modification re¬ 
agents, 755 
Hit list, 380,405,411 


HIV protease inhibitors, 717 
binding-site molecular models, 

130 

comparative molecular field 
analysis, 153 

consensus scoring study, 266 

3D CoMFA, 59 

de novo design, 113 

force field-based scoring study, 

307 

homology modeling, 123 
knowledge-based scoring 
study, 311 

molecular modeling, 103-104, 


non-peptide peptidomimetics, 

659-660 

novel lead identification by 
virtual screening, 320 
seeding experiments, 318-319 
target cf structure-based drug 
design, 433-442 
transition state analogs, 
647-649 
and water, 302 

HIV reverse transcriptase inhib¬ 
itors, 717 

X-ray crystallographic studies, 

488-489 

H + ,K + -ATPase inhibitors, 718 
HKL suite, 478 
Hoechst 33258 

binding to DNA, 544,547-552 
Homochiral molecules, 782 
HOMO (Highest Occupied Mo¬ 
lecular Orbital) energy, 11, 

14, 54 

Homology, 348 

and X-ray crystallography, 494 
Homology modeling, 261-262 
and docking methods, 305306 
molecular modeling, 123 
Homo Sapiens 

genome sequencing, 344 
HQSAR, 4 

HTML (HyperText Markup 

Language), 371, 405 
HUGO Gene Nomenclature 
Committee, 337 
Human Genome Database 
protein classes, 262 
Human genome sequencing, 344 
Human serum albumin, See Se¬ 
rum albumin 


Hydrofinasteride, 768,770 
Hydrogen bonds, 285-286, 365 
acidic protons and it -systems, 

313 

and empirical scoring, 

309-310 

enzyme inhibitors, 722,724 
hydrophobic interactions con¬ 
trasted, 319 

molecular modeling, 81, 

107- 108 
and QSAR, 6 

and structure-based design, 

409 

Hydrolases 

target of structure-based drug 
design, 449-454 
Hydrolysis 

enzyme-mediated asymmetric, 

805-806 

Hydrophobic bond, 15 
Hydrophobic effect, 178,182 
Hydrophobic interactions, 50, 

286,287-288 

discovery cf importance of, 3 
and empirical scoring, 310 
enzyme inhibitors, 724 
hydrogen bonding contrasted, 

319 

molecular modeling, 85, 

108- 110 

and QSAR, 6, 7,15-19, 23, 52 
Hydrophobicity, 16-17 
determination by chromatog¬ 
raphy, 17-18, 23 
( S)-3-Hydroxy-y-butyrolactone, 
808,810 

Hydroxychloroquine, 891 

61? -Hydroxy-1,6-dihydropurine 

riboside, 752 
Hydroxyethylurea, 153 

i? - (-) -11 -Hydroxy-10-methyla- 

porphine, 705 

Hydroxymethylglutaryl-CoA 
(HMG-CoA) reductase in¬ 
hibitors, 718,719,744-746 
(±)-3-(3-Hydroxyphenyl)-lV-n- 
propylpiperidine (3-PPP), 
704-705 

D f L-3,5'Hydroxyvalerate, 745 

HYPER, 727 
HypoGen, 256 

Hypothetical descriptor pharma¬ 
cophore, 63 

Iceberg model, cf hydrophobic 
interactions, 15 


105,108,109,111,117,120, 
122 

NMR binding studies, 533, 

559-562 
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ICM 

affinity grids, 293 
homology modeling, 305 
ligand handling, 294 
Monte Carlo minimization, 
298 

novel lead identification, 320 
IC 50 values, 731-732 
QSAR, 8 
ID3, 67 
iDEA, 390 
IDE ALIZE , 255 
Identification. See also Lead 
identification 
combinatorial library com¬ 
pounds, mass spectrometry 
application, 596-597 
mass spectrometry applica¬ 
tion, 594-596 
Idoxuridine, 1Y1 
Ifosfamide, 783 
Imines 

filtering from virtual screens, 
246 

Iminobiotin 

binding to avidin, 181,182 
Imipenem, 872-873,874 
Imipramine 
analogs, 692-693 
Immobilized enzyme inhibitors, 
720 

Immunophilins 
chemical-shift mapping of 
binding to, 545 
FK506 binding to FKBP, 
552-555 

Importance sampling, 98 
Incremental construction 
in docking, 292, 295-296 
in virtual screening, 262, 
317-318 

Indexes, 376-377,405 
Indinavir, 648,659 
structure-based design, 
438-439,440,441 
Indome thacin, 453 
Inductor variables, 25-26 
Influenza virus neuraminidase 
inhibitors, 717 

InfoChem ChemReact/Chem- 
Synth database, 386 
InfoChem SpresiReact database, 
386 

Infrared spectroscopy, 592 
Inhibitors. See also Enzyme in¬ 
hibitors; specific inhibition 


targets: i.e., Dihydrofolate 
reductase inhibitors 
finding weak by virtual 
screening, 319 
not all are drugs, 408 
structure of free, and struc¬ 
ture-based design, 409 

In-house databases, 387-388 

Inosine monophosphate dehy¬ 
drogenase 

consensus scoring study, 266 
seeding experiments, 318-319 
target of structure-based drug 
design, 446-447 

Inosine monophosphate dehy¬ 
drogenase 2 

X-ray crystallographic studies, 
489 

In silieo screening, 191,244. See 
also Virtual screening 

Insulin-like growth factor 1 
X-ray crystallographic studies, 
489 

Insulin-like growth factor 2 
X-ray crystallographic studies, 
489 

Insulin-like growth factor 1 re¬ 
ceptor 

X-ray crystallographic studies, 
489 

Integrin alphaM 
X-ray crystallographic studies, 
489 

Interatomic distance variant an¬ 
alogs, 710-712 

Intercellular adhesion molecule 
1 

X-ray crystallographic studies, 

489 

Interferon a 1 

X-ray crystallographic studies, 

490 

Interferon y 

X-ray crystallographic studies, 
490 

Interleukin 1 

X-ray crystallographic studies, 
490 

Interleukin 2 

X-ray crystallographic studies, 
490 

Interleukin 3 

X-ray crystallographic studies, 
490 

Interleukin 4 

X-ray crystallographic studies, 
490-491 


Interleukin 5 

X-ray crystallographic studies, 
491 

Interleukin 6 

X-ray crystallographic studies, 
491 

Interleukin 8 

X-ray crystallographic studies, 
491 

Interleukin 10 

X-ray crystallographic studies, 
490 

Interleukin 12 

X-ray crystallographic studies, 
490 

Interleukin 13 

X-ray crystallographic studies, 
490 

Interleukin- 1J3- converting en¬ 
zyme (ICE) 

transition state analog inhibi¬ 
tors, 655 

Interleukin 1 receptor 

X-ray crystallographic studies, 
490 

Intermolecular forces, 6 

InterPro, 349 
InterProScan, 349 
Intracellular adhesion molecule 
1 (ICAM-1) 

target cf NMR screening stud¬ 
ies using SAR-by-NMR, 

566-567 

Inventory data, 405 
Inverse folding, 123-125 
Inverse QSAR, 4 
Inverted keys, 405 
Ionic bonds, 6,170,365 
Ion-induced dipole interactions, 
173 

Ipconazole, 41 
Irinotecan, 849,861 
Irreversible inhibitors, 755 
ISIS/Base, 377,387 
ISIS databases, 373, 376-377, 
387 

descriptors, 192 
exact match searching, 380 
similarity searching, 382-383 
substructure searching, 382 
ISIS/Direct, 387 
ISIS/Draw, 387 
Isomer search, 388, 405-406 
Isoprenaline, 885 
Isotope editing and filtering, in 
NMR, 545,546 
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Iterative cyclic approaches 
and combinatorial library de¬ 
sign, 217 

Ivermectin, 849,891,892 

Japanese Patent and Trademark 
Documents database, 386 
Jarvis-Patrick algorithm, 205,401 
and combinatorial library de¬ 
sign, 220,222 
Java, 396,406,407 
JG-365,121,122 
Joins, 390,406 

Journal content searching, 383 

Kaempferol, 865 
Kanomycin, 870,871 
Karplus relationship, 525 
Kelatorphan, 650,651 
Kennard-Stone method, 65, 66 
Ketoconazole, 717 
8-Ketodeoxycoformycin,751 
Ketol-enol tautomerization 
NMR spectroscopy, 527-528 
Ketolides, 876 
Ketones 

pharmacophore points, 249 
Key-based similarity searching, 
382-383 
Key field, 406 
Keys, 363 
encryption, 403 
molecular similarity methods, 
188 

Khellin, 883, 884 
Kinases 

focused screening libraries 
targeting, 250 

King's Clover, drugs derived 
from, 882 

Kitz-Wilson plots, 757 
K-means clustering, 401,406 
K-medoids clustering, 406 
K-Nearest Neighbors, 53, 62-63 
KNI-272,562 
Knowledge-based scoring, 
264-265,307,310-312 
Knowledge-bases, 352,379 
Knowledge Discovery in Data¬ 
bases (KDD), 394,406 
Kohonen's Self-Organizing Map 
method, 65, 66 
KOWWIN, 389 
Kubinyi bilinear model, 3, 31 

L-162,313, 669 
L-364,286, 855 


L-365,260,856 
L-370,518,660 
L-685,434 

structure-based design, 439 
L-732,747 

stmcture-based design, 440, 
441 

L-735,525,797 
L-746,072,211 

L1210 

growth inhibition, 37, 40-41 
inhibition cf DHFR, 32, 34 
L. major DHFR, QSAR inhibi¬ 
tion studies, 33 

Laboratory information manage¬ 
ment systems, 377 
j3-Lactamase inhibitors, 718, 
868-874 

X-ray crystallographic studies, 
483 

j3-Lactams, 868-874 
Lamarckian genetic algorithm, 
299 

Lamivudine, 812-813,816 
LASSOO algorithm, 217 
Latent inactivators, 756 
Latent semantic structure in¬ 
dexing, 255 
Laudexium, 857,858 
Lead generation, 426 
Lead identification, 244 
focused screening libraries for, 
250-252 

virtual screening for novel, 
320-321 

Lead molecule fragment analogs, 
707-710 
Leaf nodes, 377 
Leave-one-out cross validation, 

57,64 

Leave-some-out cross validation, 
64 

Legion, 387 

Lennard-Jones potential, 285 
Lentinan, 849 

Leucine aminopeptidase inhibi¬ 
tors, 737-738 

Leukocyte function-associated 
antigen 1 (LFA-1) 
target of NMR screening stud¬ 
ies using SAR-by-NMR, 
566-567 

Leukotreine A4 hydrolase 
X-ray crystallographic studies, 
491 

Leveling effect, 722 
Levorphanol, 708 


LH-RH antagonist, 634 
LH-RH peptidomimetic analog, 
640 

LibEngine, 221 
LiBrain, 220 

Libraries, 367, 400. See also 
Combinatorial libraries 
focused screening libraries for 
lead identification, 250-252 
for NMR screening, 574576 
QSAR for rational design of, 
68-69 
Lidorestat 

stmcture-based design, 
448-449 

Ligand-based design 
NMR screening for, 564-577 
NMR spectroscopy for, 510, 
517-532 

Ligand-based virtual screening, 
188 

Ligand design, 110-118 

LigandFit 

sampling/scoring methods 
used, 261 

Ligand flexibility. See Flexible 
ligands 
Ligands 

macromolecule-ligand interac¬ 
tions, NMR spectroscopy, 
510,517,535-562 
non-peptidic ligands for pep¬ 
tide receptors, 667-674 
visually assisted design, 110 
Ligand strain energy, 308 
LIGSITE, 291 

Linear free energy relationship, 

12, 14 

Linear interaction energy 
method, 120 

Linear notation, 368-369,406 
Linear QSAR models, 26-28 
descriptor pharmacophores, 

61-62 

Linear regression analysis 
in QSAR, 8-11, 50, 53, 67 
Line-shape, in NMR, 512 
and ligand dynamics, 528-531 
Lineweaver-Burk plot, 727, 729, 
731 

Link nodes, 381,397 
LINUS (Local Independent Nu¬ 
cleating Units of Structure), 
124 

Linux, 396,406,411 
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Lipinski’s "rule of 5" 
and combinatorial library de¬ 
sign, 214-215,216 
in druglikeness screening, 245 
for molecular similarity ldiver- 
sity methods, 193,208 
and NMR screening, 575 
Lipocortin I 

X-ray crystallographic studies, 
491 

Lipophilic interactions, 286 
Liquid chromatography 
for enantiomer separation, 

787 

Liquid chromatography-mass 
spectrometry (LC-MS), 
586-591 

affinity screening, 598-599 
fast, 596 

future developments, 607-608 
gel permeation chromatogra¬ 
phy screening, 599 
pulsed ultrafiltration screen¬ 
ing, 603-606 

for purification of combinato¬ 
rial libraries, 592-594 
structure/purity confirmation 
cf combinatorial libraries, 
594596 

Liquid chromatography-NMR- 
MS, 608 
Liquids 

force field models for simple, 
176 

Liquid secondary ion mass spec¬ 
trometry (LSIMS), 586,587 
Lisinopril, 650, 881 
asymmetric synthesis, 807, 
809 

Literature content searching, 
383 

LitLink, 387 

Local stereochemistry, 398 
Lock-and-key hypothesis, 251, 
252 

deformable models, 5 
Locus maps, 140 
Log 1/C, 25, 27-29 
Log CR, 25 

Logic, in query features, 406 
Logic and Heuristics Applied to 
Synthetic Analysis 
(LHASA), 379 
Log MW, 24 
Log P 

chloroform-octanol, 17 


chromatographic determina¬ 
tion, 17-18, 23 
estimation systems, 388,389 
for molecular similarity/diver¬ 
sity methods, 193,208 
and polarity index, 26 
Log Perm, 25 
Log TA98, 25-26 
Lomerizine, 41, 42 
Lometrexol 

structure-based design, 
429-430 

London forces, See van der 
Waals forces 
Lopinavir, 648,659 
asymmetric synthesis, 807, 

809 

Lorentz-Lorenz equation, 24 
Lovastatin, 878-879 
Low Mode Search, 292,301 
LUDI, 259, 295-296, 315 
combinatorial docking, 318 
empirical scoring, 310 
in molecular modeling, 112, 
113 

for novel lead identification, 
321 

LUMO (Lowest Unoccupied Mo¬ 
lecular Orbital) energy, 11, 

14, 26, 54 

Luteinzing hormone /3 
X-ray crystallographic studies, 
491 

LuxS 

X-ray crystallographic struc¬ 
ture elucidation, 494-495 
LW-50020,849 
LY-303366,877 
LY-315920 

structure-based design, 454 
Lycopene 

positive ion APCI mass spec¬ 
trum, 588 

tandem mass spectrum, 591 
Lymphomas, 718 
Lysine 

chemical modification re¬ 
agents, 755 

MACCS3D, 259,363 
in molecular modeling, 111 
MACCS (Molecular Access Sys¬ 
tem), 254,361362 
Machine learning techniques 
in molecular modeling, 151 
in QSAR, 62 

Macrocyclic mimetics, 635-636 


MACROMODEL, 94 
Macromolecular structure deter¬ 
mination, 334 
NMR spectroscopy applica¬ 
tions, 533535 

Macromolecule-ligand interac¬ 
tions, See Protein-ligand 
interactions 
Macrophage CSF 1 
X-ray crystallographic studies, 
491 

MACROSEARCH, 94 
Magnetization transfer NMR, 
568-570 

Ma huang, 884-885 
Mandelate racemase inhibitors, 
762,763 

Manhattan distance, 68 
Marcaine 

classical resolution, 795 
Marijuana, 853 
Marine source drugs 
antiasthma, 886 
anticancer, 867-868 
Markup languages, 371-372,405 
Markush feature, 381 
Markush structures, 367,368, 
373,406 
MARPAT, 385 
Masoprocal, 849 
Mass spectrometry, 583-592 
affinity capillary electrophore* 
sis-mass spectrometry, 
599-600 

affinity chromatography-mass 
spectrometry, 598-599 
bioaffinity screening using 
electrospray FTICR MS, 
601-603 

encoding and identification cf 
combinatorial compounds 
and natural product ex¬ 
tracts, 596-597 
frontal affinity chromatogra¬ 
phy-mass spectrometry, 601 
future directions, 607-608 
gel permeation chromatogra- 
phy-mass spectrometry, 599 
LC-MS purification cf combi¬ 
natorial libraries, 592-594 
MS-based screening, 597-598 
pulsed ultrafiltration-mass 
spectrometry, 603-606 
solid phase mass spectromet- 
ric screening, 606-607 
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structure/purity confirmation 
of combinatorial com¬ 
pounds, 594-596 
types of mass spectrometers, 
585 

Material Safety Data Sheets da¬ 
tabase, 386 

Material Safety Data Sheet 
searching, 384 
Matriptase 

virtual screening of inhibitors, 
269-271,272 

Matrix-assisted laser desorption 
ionization (MALDI), 586, 
596,606-607 

Matrix metalloprotease inhibi¬ 
tors, 227,555457 
chemical-shift mapping of 
binding to, 545 

target of NMR screening stud¬ 
ies using SAR-by-NMR, 566 
target of structure-based drug 
design, 443-445 
transition state analog inhibi¬ 
tors, 651-652 
virtual screening, 315 
Maximum Auto-Cross Correla¬ 
tion (MACC), 202 
Maxmin approach, 208 
May apple, drugs derived from, 
865 

Maybridge catalog, 385 
MCDOCK 

Monte Carlo simulated an¬ 
nealing, 297 

MD Docking (MDD) algorithm, 
298 

MDL Information Systems, Inc. 

databases, 386-387 
Mechanism-based enzyme inhib¬ 
itors, 759-760,764-771 
Mefloquine, 889-890 
artemisinin potentiates, 
887-888 

Meglumine, 796-797 
Melagatran 

structure-based design, 442, 
444 

a-Melanotropin 
conformationally restricted 
peptidomimetics, 637 
Melatonin 
analogs, 693 
antagonists, 211-212 
Melilotus officinalis (ribbed me- 
lilot), 882 


Melittin 

molecular modeling, 124 
Members, of Rgroups, 368,406 
Membrane-bound drug targets, 
351 

Membrane-bound proteins 
molecular modeling, 154 
NMR structural determina¬ 
tion, 535 

Membrane-bound receptors, 5 
Mepartrican, 849 
Meperidine, 708,851 
rigid analog, 696 
6-Mercaptopurine, 717 
Mercury search program, 387 
Merged Markush Service, 386 
Merimepodip 

structure-based design, 447 
MERLIN, 39,386 
Messenger RNA, See mRNA 
Metabolism. See also Absorp¬ 
tion, distribution, metabo¬ 
lism, and excretion (ADME) 
enantiomers, 786-787 
Metabolism databases, 385,386 
Metabolism screening, 591 
pulsed ultrafiltration applica¬ 
tion, 605 

Metabolite database, 386 
Metadata, 375,376,406 
Meta-layer searching, 395 
Metallopeptidase inhibitors 
transition state analogs, 
649-652 
Metamitron, 42 
Metazocine, 708 
Metconazole, 41, 42 
Methadone, 708 
Methamphetamine 

ring substitution analogs, 704 
R-Methanandamide, 852,853 
Methanol 

force field models for, 176 
Methicillin, 869,870,871 
Methionine :adenosyl trans¬ 
ferase, 148 

Methionine hydrochloride 
nonclassical resolution, 803 
Methods in Organic Synthesis 
database, 385 

Methotrexate, 717, 718, 749 
interaction with dihydrofolate 
reductase, 120 
structure-based design, 425 
N-Methyl-acetemide 
force field models for, 176 


2-Methyl-l,4-benzenediol 

allergenicity prediction, 833 
a-Methyldopa, 785 

5,10-Methylene-tetrahydrofo- 

late, 426, 427 

Methyl group roulette, 700 
Methylphenidate (Ritalin) 
classical resolution, 793-794 
nonclassical resolution, 801 
Metocurine, 856,857 
Metoprolol 
renal clearance, 38 
Metropolis algorithm, 94, 98 
D,L-Mevalonate, 745 
Mevastatin, 744-745 
MHCI receptor 
homology modeling, 123 
molecular modeling, 117 
Michaelis-Menten constants, 
725-728 

use in QSAR, 7,8 
Michaelis-Menten kinetics, 
725-728 

Microarray chips, 334,344-345 
Microbial secondary metabolites, 
848 

MicroPatent, 386 

Microsoft Access, 373 
Middle tier, 392, 406-407 
Miglitol, 849 
Milbemycins, 891,892 
L-Mimosine 
analogs, 690 
MIMUMBA, 255 
Mini-fingerprints, 255 
Minimum topological difference 
(MTD) method, 4,147 
Mining minima algorithm, 292, 
299-300 

Mitogen-activated protein ki¬ 
nase 

target of structure-based drug 
design, 456-459 
Mivacurium, 857,859 
Mixtures, 367-368 
Mizoribine, 849 
MK-329,855 
MK-383,213 
MK-499,814-815,818 
MK-0677,671,674 
MK-678,657 
ML-236B, 879 
MLPHARE, 478 
MM-25 

structure-based design, 423, 
424 
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MM-30 

structure-based design, 423, 
424 

MM2 force field, 80, 307 
MM3 force field, 80,118 
MM-PBS A method, 315 
Modeling, See Molecular model¬ 
ing 

Model mining, 3 
Model receptor sites, 149-150 
Molar refraction, 24, 54 
MOLCONN-Z, 55,192,389 
Molecular Biology Database Col¬ 
lection, 345 

Molecular comparisons, 138-142 
Molecular connectivity, 192,407 
estimation systems, 388 
in QSAR, 26, 55, 56, 61 
Molecular docking methods, See 
Docking methods 
Molecular dynamic simulations. 
See also Monte Carlo simu¬ 
lations 

barrier crossing, 98 
with docking methods, 292, 
298 

and force field-based scoring, 
308 

hydrogen bonds, 107 
in molecular modeling, 85, 93, 
95-100,116-117,142 
and non-Boltzmann sampling, 
100 

protein flexibility, 301-302 
statistical mechanical, 94, 95 
of temperature, pressure, and 
volume, 96 

thermodynamic cycle integra¬ 
tion, 99 

in virtual screening, 263 
water's role in docking, 
302303 

Molecular eigenvalues, 54 
Molecular electrostatic potential, 
102 

Molecular extensions, 130-131 
Molecular field descriptors, 54, 
55-57 

Molecular Graphics and Model¬ 
ing Society, 360 
Molecular holograms, 54 
Molecular mechanics, 79-100 
force fields, 174-177 
Molecular modeling, 77-79, 
153-154,358 

affinity calculation, 118-122 
and bioinformatics, 351 


common patterns, 142-150 
conformational analysis, 87, 

93- 94 

and electrostatic interactions, 
81-85 

an d force fields, 79-81 
known receptors, 103-127 
ligand design, 110-118 
molecular comparisons, 
138-142 

and molecular mechanics, 
79-100 

pharmacophore versus binding 
site models, 127-135 
potential surfaces, 85-89 
protein structure prediction, 
122-127 
and QSAR, 5 
and quantum mechanics, 
100-103 

similarity searching, 135-138 
site characterization, 105-110 
and statistical mechanics, 

94- 95 

in structure-based design, 419, 
420 

systematic search, 89-94,116 
unknown receptors, 127-153 
and virtual screening, 244 
Molecular multiple moments, 54 
Molecular property visualiza¬ 
tion, 137-138 
Molecular recognition, 283 
and hydrophobic interactions, 
15 

physical basis of, 284-289 
Molecular replacement, 477 
Molecular sequence alignment, 
353 

Molecular sequence analysis 
bioinformatics for, 335-336 
Molecular shape analysis, 53 
Molecular shape descriptors, 54 
Molecular similarityldiversity 
methods, 54, 188-190 
analysis and selection meth¬ 
ods, 203-209 

combinatorial library design, 
190,214-228 
descriptors for, 191-203 
example applications, 228-237 
future directions, 237 
and molecular modeling, 
135-138 

virtual screening by, 188,190, 
209-214 


Molecular structure descriptors 
in QSAR, 26 

Molecular targets, See Drug tar¬ 
gets 

Molecular weight 

for molecular similarity/diver¬ 
sity methods, 193,208 
and QSAR, 24-25 
MOLGEO, 255 
Molinspiration, 390 
MOLP AT, 110-111 
Monasus ruber , 879 
Monoamine oxidase inhibitors, 
718 

Monobactams, 873 
Monocolin K, 879 
Monomer Toolkit, 377378 
Monte Carlo simulated anneal¬ 
ing 

and combinatorial library de¬ 
sign, 218 

with docking methods, 292, 
297 

with virtual screening, 263 
Monte Carlo simulations. See 
also Molecular dynamic sim¬ 
ulations 

barrier crossing, 98 
and combinatorial library de¬ 
sign, 217 

de novo design, 113 
with docking methods, 292, 

297-298 

in molecular modeling, 85, 86, 
93, 96-99,116-117,142 
and non-Boltzmann sampling, 
100 

statistical mechanical, 94, 95 
thermodynamic cycle integra¬ 
tion, 99 

in virtual screening, 263 
Moore's Law, 393 
Morgan algorithm, 378,407 
Morphiceptin, 144,145 
Morphinans, 850 
Morphine, 634 
ecological function, 848 
fragment analogs, 707-708 
Morphine alkaloids, 849-851 
Mosflm/CCP4, 478 
Most descriptive compound 
(MDC) method, 207-208 
mRNA 

and expression profiling, 
340-341 

MSDRL/CSIS, 361 
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MS-MS, See Tandem mass spec¬ 
trometry (MS-MS) 

Mulliken population analysis, 

101-102 

MULTICASE SAR method 
toxicity prediction application, 

828-843 

Multidimensional databases, 

390,407 

Multidimensional NMR spec¬ 
troscopy, 512-514 
Multidimensional scaling, 201 
Multidimensional scoring, 291 
Multilevel chemical compatibil¬ 
ity, 249 

Multiple-copy simultaneous 
search methods (MOSS), 

298 

Multiple isomorphous replace¬ 
ment (MIR) phasing, 477 
Multiple regression analysis 
in QSAR, 8-11, 50, 52, 53 
Multisubstrate analog enzyme 
inhibitors, 720,741-748 
Multi-tier architecture, 392,407 
Multiwavelength anomalous dif¬ 
fraction (MAD) phasing, 
474,477-478 

Munich Information Center for 
Protein Sequences (MIPS), 

335 

Muscarinic receptors 
distance range matrices, 136 
stereoisomer analogs, 705-706 
Mutation 

in genetic algorithms, 87, 88 
MVIIA (Ziconotide) 

NMR spectroscopy, 518-523, 
526,534 

MVT-101,103-104,105,117 

Mycophenolate mofetil, 849 
Mycophenolic acid 
structure-based design, 

446-447 

Myoglobin, 419 

Nabilone, 853 
Nadolol 

renal clearance, 38 
Naftifine, 717 

Na + ,K + -ATPase inhibitors, 718 
Nalorphine, 850 
Naloxone, 850 
NAPRALERT, 597 

Naproxen 

classical resolution, 794-795 


enzyme-mediated asymmetric 
synthesis, 805 
Narwedine, 802,803 
National Cancer Institute data¬ 
base, 222, 254, 385-386, 387 
National Center for Biotechnol¬ 
ogy Information (NCBI), 

335 

sequence databases, 387 
National Toxicology Program, 

246,829 

Natural product mimetics, 636 
Natural products 
antiasthma drug leads, 

883-886 

antibiotics drug leads, 

868-878 

anticancer drug leads, 

858-868 

antiparasitic drug leads, 

886-891 

cardiovascular drug leads, 

878-883 

CNS drug leads, 849-856 
drugs derived from, 
1990-2000,849 
extract encoding and identifi¬ 
cation, 596-597 
leads for new drugs, 847-894 
neuromuscular blocking drug 
leads, 856-858 
NMR structure elucidation, 
517-518 

Natural products databases, 387, 
597 

Nearest neighbors methods, 53, 

62-63, 67 

Neighborhood behavior, 211 

Nelfinavir, 648 

asymmetric synthesis, 

817-818 

structure-based design, 440, 
442 

Neomycin, 870,871 
Netropsin 

binding perturbations, 544 

Neu5Ac2en 

structure-based design, 451 
Neural networks, See Artificial 
neural networks 
Neuraminidase inhibitors, 717 
flexible docking studies, 265 
PMF function application, 314 
ScreenScore application, 319 
target of structure-based drug 
design, 450-452 


X-ray crystallographic studies 
[int B virus], 491 
Neuroleptics 
molecular modeling, 150 
Neuromuscular drugs 
natural products as leads, 
856-858 
Neuropeptide Y 
X-ray crystallographic studies, 
492 

Neuropeptide Y inhibitors, 671, 
673,674 

Neutral endopeptidase (NEP), 
650-651 

Nitric oxide synthase, 736 
Nitric oxide synthase inhibitor, 
738-739 
Nivalin, 892 

NK receptor antagonists, 

669-670,672 

NMR, See Nuclear Magnetic 
Resonance (NMR) spectros¬ 
copy 

NMR timescale, 537 
NN-703,671,675 
NOE, See Nuclear Overhauser 
effects (NOE), in NMR 
Nolatrexed, 428 
Non-Boltzmann sampling, 100 
Nonclassical bioisosteres, 
690-694 

Nonclassical resolution, of chiral 
molecules, 799-804 
Noncompetitive inhibitors, 

730-731 

Noncovalent bonds, 6,170 
energy components for inter- 
molecular drug-target bind¬ 
ing, 171-174 

Noncovalently binding enzyme 
inhibitors, 720-754 
Nonisosteric bioanalogs, 

689-694 

Nonlinear QSAR models, 28-29 
descriptor pharmacophores, 

62-63 

Nonlinear regression, 67 
Non-overlapping mapping, 398 
Non-peptide peptidomimetics, 

636,657-674 

Nonpolar interactions, See van 
der Waals forces 
Nonstructural chemical data, 

373 

N orapomorphine 
alkyl chain homologation ana¬ 
logs, 701 
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Norfloxacin, 41, 42 

Norstatine, 652 

Norvir 

structure-based design, 438, 
440 

Nostructure, 410 
NOT logical operator, 406 
NPS 1407,812,815 
Nuclear hormone receptors 
focused screening libraries 
targeting, 250 

Nuclear Magnetic Resonance 
(NMR) imaging, 510 
Nuclear Magnetic Resonance 
(NMR) screening methods, 
510,562-577 
capacity issues, 190 
Nuclear Magnetic Resonance 
(NMR) spectroscopy, 351, 
507-514,592. See also SAR- 
by-NMR approach 
applications, 516-517 
chemical shift mapping, 
543-545 

instrumentation, 514-516 
with LC-MS, 608 
ligand-based design, 510, 

517-532 

macromolecule-ligand interac¬ 
tions, 510, 517, 535-562 
metabolic, 510 
and molecular modeling, 78 
multidimensional, 512-514 
for pharmacophore modeling, 

531- 532 

receptor-based design, 510, 

532- 562 

in structure-based drug de¬ 
sign, 419,516-517 
and structure-based library 
design, 225 

structure determination cf 
bioactive peptides, 517-518 
structure elucidation of natu¬ 
ral products, 517-518 
and virtual screening, 244 
Nuclear Magnetic Resonance 
(NMR) titrations, 545 
Nuclear Overhauser effect 
(NOE) pumping, 573 
Nuclear Overhauser effects 
(NOE), in NMR, 511,512 
for conformational analysis, 
525 

and distance range matrix, 
136 


for macromolecular structure 
determination, 533 
and NMR screening, 571-573 
NOE docking, 545-546 
transferred NOE technique, 
532 

Nucleic acid receptors, 5 
Nucleic acids. See also DNA, 
RNA 

biochemical force fields, 
175-176 

NMR structural determina¬ 
tion, 535 

N ucleotide intercalation, 183 

O (graphics program), 478 
Object-oriented language, 407 
Object relational database, 407 
Ocreotide, 657 

Octanol/water partitioning sys¬ 
tem, 16-17 

OLAP (OnLine Analytical Pro¬ 
cessing), 390,408 
Oleandomycin, 870,871 
OLTP (OnLine Transaction Pro¬ 
cessing), 390,408 
Omapatrilat, 651 
OMEGA, 255 
Ondanetron 

nonclassical resolution, 802 

OpenBabel, 372 

Open Molecule Foundation, 360 
Open reading frames 
housing in DNA databases, 
338 

Opium poppy, 848,849 
Optimization approaches 
for combinatorial library de¬ 
sign, 217-220 
OptiSim method, 207 
and combinatorial library de¬ 
sign, 220 
Oracle, 373 

Organic structure databases, 

385 

Organoarsenical agents, 717 
Orientation map (OMAP), 131, 
144,146 

Oriented-substituent pharma¬ 
cophores, 224 
Orlistat, 848,849 
OR logical operator, 406 
use in molecular similarity/ 
diversity methods, 194 
Ornithine decarboxylase inhibi¬ 
tors, 717, 766, 768, 769 
Oseltamivir, 452, 717 


OSPPREYS (Oriented-Substitu- 

ent Pharmacophore PRop- 
ErtY Space), 199,224 
Overlapping mapping, 398 
OWFEG (one window free en¬ 
ergy grid) method, 308,315 
Oxidation 

enzyme-mediated asymmetric, 
806 

Oxidoreductases 
target of structure-based drug 
design, 445-449 
Oxprenolol 
renal clearance, 38 
Oxytetracyclin, 870 

P. carinii DHFR, QSAR inhibi¬ 
tion studies, 32-33 
Pacific yew, paclitaxel from, 
861-862 

Paclitaxel, 843,848,861-863 
Pairwise interactions, 79-80 
PALLAS System, 389 
Paluther, 887 
Pamaquine, 888-889 
Pancreatic polypeptide 
molecular modeling of avian, 
124 
Papain 

QSAR studies, 5 
transition state analog inhibi¬ 
tors, 654 

Papaver somniferum (opium . 

PoPPY), 848,849 
Parallel chemistry, 283 
Parallel library, 214 
Parallel processing, 408 
Parathion, 774 
Parathyroid hormone 
X-ray crystallographic studies, 
492 

Parent structure, 368,404 
Pareto optimality, 220 
Partial charge, 366,373 
Partition coefficients, 16-17, 54 
Partition function, 94-95 
Partitioning algorithms, 67 
PASS, 291,390 
Patent Citations Index, 386 
Patent databases, 386 
Patent searching, 383-384 
Pathways, 495-496 
X-ray crystallographic analy¬ 
sis, 495-496 
Pattern recognition, 408 
and cluster analysis, 401 
with QSAR, 53 
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PC cluster computing, 283-284 

PCModels, 386 
PD-119229 

structure-based design, 
460-461 

PDB file format, 369 
PDGF beta 

X-ray crystallographic studies, 
492 

Peak intensities, in NMR, 512 
Peldesine 

structure-based design, 460 
Pemetrexed 
structure-based design, 
429-430 

Penicillins, 717,868-870 
preventing bacterial degrada¬ 
tion, 718 

Penicillipepsin inhibitors 
molecular modeling, 116 
Penicillium brevicompactum , 

879 

Penicillium chrysogenum, 869 
Penicillium citrinium, 879 
Pentostatin, 717,750-751,849 
PeptiCLEC-TR, 804 
Peptide backbone mimetics, 636, 
644-645 

Peptide bond isosteres, 644-646 
Peptides, 634 
biochemical force fields, 
175-176 

NMR structural determina¬ 
tion of bioactive, 517-518 
non-Boltzmann sampling of 
helical transitions, 100 
Peptidomimetics, 128-129, 
633-634 

classification, 634-636 
conformationally restricted 
peptides, 636-643 
future directions, 674-677 
molecular modeling, 154 
non-peptide, 636,657-674 
peptide bond isosteres, 
644-646 

protease inhibitors, 646-655 
speeding up research, 655-657 
template mimetics, 643-644 
Peramivir 

structure-based design, 452 
Personal chemical databases, 
387-388 
Petabyte, 408 
Pethidine, 851 
PETRA, 390 
Petrosia contignata , 886 


Pfam, 349 

Pharmacophore keys, 376, 
408-409 

Pharmacophore mapping, 255 
Pharmacophore point filters, 
196,249-250 
Pharmacophores, 368 
with BCUT descriptors, 
223-224 

binding site models con¬ 
trasted, 127-135 
defined, 252-253,408 
descriptor, for QSAR, 60-63 
3D searching, 366-367 
in molecular modeling, 110 
for molecular similarity/diver¬ 
sity methods, 194-201, 
204-206 

NMR-based modeling, 

531-532 

NMR spectroscopy-based mod¬ 
eling, 531532 
oriented-substituent, 224 
site-based, 235-237 
virtual screening, 252-260 
PharmPrint method, 223 
Phase problem 
in X-ray crystallography, 
476-478 
Phencyclidine 
rigid analogs, 696-697 
j3-Phenethylamines, 697-698 
Phenols 

DNA synthesis inhibition by, 
40 

growth inhibition by, 38, 
40-41 

Phenylacetic acids 
ionization of substituted, 

12-14 

Pheny le th an o 1 a m i n ciV-methy 1- 
transferase (PNMT) inhibi¬ 
tors, 733-734,740 
(R^-a-Phenylglycidate, 762 
(iS>a-Phenylglycidate, 762 
N’ - (R -Phenyl) sulfanilamides 
antibacterial activity, 10 
Phosphatidylcholine monolayers 
penetration by ROH, 27 
Phosphocholine 
docking to antibody McPC603, 
298 

Phosphodiesterases 
alignment of catalytic domains 
in gene family, 349 
Phospholipase A2 
homology modeling, 123 


target of structure-based drug 
design, 453-454 
X-ray crystallographic studies, 
492 

Phosphonoacetate, 740 
N-Phosphonoacetyl-L-aspartate 
(PALA), 743-744 
Phosphonoformate, 740 

2- (Phosphonomethoxy )ethylgua- 

nidines 

chain branching analogs, 702 
(R)- 9- [2-(Phosphonome- 

thoxy)propyl]adenine (R- 

PMPA), 818-819 
Phosphoryl transferases 
target of structure-based drug 
design, 456-4561 
Phylogenomics, 347-349 
Physicochemical descriptors, 54 
estimation systems, 389 
for molecular similarity/diver¬ 
sity methods, 193 
for virtual screening, 255 
Physicochemical properties, 373, 
409 

Physostigmine, 774 
Picomaviruses 

target of structure-based drug 
design, 454-456 
Picovir 

structure-based design, 
455-456 

Pigeon liver DHFR, QSAR inhi¬ 
bition studies, 31-32 
Pipecolic acid, 805 
Pirlindole 

chromatographic separation, 
788-789,790 

Pit viper, drugs derived from, 
881 

Pivoting data, 409 

Plant natural products, 848,893 

Plant secondary metabolites, 

848 

Pleconaril 

structure-based design, 
455-456 
PLOGP, 389 
PLP function, 266,309 
consensus scoring, 320 
hydrophobic interactions, 319 
performance in structure pre¬ 
diction, 314 

seeding experiments, 319 

PLUMS, 225 
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p38 MAP kinase 

consensus scoring study, 266 
seeding experiments, 318319 
PMF function, 265,311,312 
performance in structure pre¬ 
diction, 314 

seeding experiments, 319 
PMML (Predictive Model 
Markup Language), 405 
PNU-107859 
NMR binding studies, 555 
PNU-140690, 659, 812, 813 
PNU-142372 

NMR binding studies, 

555-556 
POCKET, 259 
Podophyllin 

drugs derived from, 865-867 
Podophyllotoxin, 849, 865-866 
Podophyllum emodi, 865 
Podophyllum peltatum (May ap¬ 
ple), 865 

Poisson-Boltzmann equation, 
83,84 

Polarity index, 26 
Polarizability, 85 
Polarizability index, 11 
Polarization energy, 173 
Polar surface area 
in druglikeness screening, 245 
Policosanol, 849 
Pomona College Medchem, 385 
Potassium channel shaker 
X-ray crystallographic studies, 
492 

Potential smoothing, 86 
Potential surfaces, 85-89 
PPARy 

X-ray crystallographic studies, 

492 

Pralnacasan 

structure-based design, 443, 

446 

Pravastatin, 879, 880 
Preliminary screening, 111-112 
Pressure 

molecular dynamic simula¬ 
tion, 96 

Primaquine, 889 
Principal components analysis 
with molecular similarity/di¬ 
versity methods, 192,201 
in QSAR, 15 

Principal components regres¬ 
sion, 53 
Prindolol 
renal clearance, 38 


Prinomastat 

structure-based design, 444, 

446 

PRINTS, 335,349 
Privileged structures 
in molecular similarity/diver¬ 
sity methods, 209 
template mimetics, 644 
in virtual screening, 251-252 
PROBE, 126 

PROCHECK program, 478 

ProDock 

affinity grids, 293 
Monte Carlo minimization, 

298 

ProDom, 349 

Proflavin 

thermodynamics of binding to 
DNA, 183 

Progesterone receptor 
antibody FAB fragment, 128 
X-ray crystallographic studies, 

492 

PROGOL, 67 

Project Library, 387 
Prolactin receptor 
X-ray crystallographic studies, 

492 

PRO-LEADS, 299 
assessment, 303,304 
flexible ligands, 263 
PRO-LIGAND 
genetic algorithm with, 89 
Pronethalol, 881 
Propargylglycine, 719-720 
Property-based design, 234-235 
Propranolol, 881-882 
enantiomers, 786 
enzyme-mediated asymmetric 
synthesis, 805-806 
renal clearance, 38 
N1 0-Propynyl-5,8-dideazafolate, 
426,427 

Proresid, 866-867 
PRO-SELECT 
combinatorial docking, 318 
PROSITE, 34834 9 
Prostacyclin, 762-763 
Prostaglandin synthase inhibi¬ 
tors, 718,762-763,764 
Protaxols, 863 
Protease inhibitors. See also 
HIV protease inhibitors 
affinity labels, 762 
QSAR studies, 5 
structural genomics, 353 


target of structure-based drug 
design, 432-445 
transition state analogs, 

646-655 

Protein classes, 262 
Protein Data Bank, 110,353 
sequence database, 387 
X-ray crystallography applica¬ 
tion, 478-479 
Protein Database 
and virtual screening, 261-262 
Protein fa mi lies 
targeting in libraries for vir¬ 
tual screening, 251 
Protein interactions, 334 
Protein-ligand docking pro¬ 
grams, 292 

Protein-ligand docking tech¬ 
niques, 262-264 
Protein-ligand interactions, 

284-289,322 

NMR spectroscopy, 510,517, 
535-562 
QSAR studies, 5 
scoring, 264-267 
scoring in virtual screening, 
264-266 

Protein-protein interactions, 634 
characterizing, 637 
Proteins. See also Macromolecu- 
lar structure determination 
binding and chirality, 786-787 
flexibility and docking, 
300-302,322 
phylogenetic profiling, 

347-348 

Protein structures 
prediction, 122-127 
in structure-based virtual 
screening, 261-262 
X-ray crystallographic analy¬ 
sis, 496 
Proteome, 352 
Proteomics. 409 
Pseudoirreversible enzyme in¬ 
hibitors, 771-774 
Pseudomonas acidophila, 873 
Pseudopeptides, 635-636 
isosteres replacing peptide 
backbone groups, 646 
Pseudoracemate, 799-800,801 
Pseudo-receptor models, 261 
PSI-BLAST, 335,347 
X-ray crystallography applica¬ 
tion, 481 

Pulsed ultrafiltration-mass spec¬ 
trometry, 603-606 
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Purine biosynthesis inhibitors, 
752 

Purine nucleoside phosphorylase 
target of structure-based drug 
design, 459-461 
Purine ribonucleoside, 751-752 
Purity verification 
as bottleneck in drug discov¬ 
ery, 592 

LC-MS-based purification, 

592-594 

mass spectrometry applica¬ 
tion, 594-596 

Pyridoxal phosphate-dependent 
enzymes 

mechanism-based inhibitors, 
765-768 

Pyrimidine biosynthesis inhibi¬ 
tors, 752 
Pyrrolinones 

peptide-like side chains, 635, 
642 

Pyruvate dehydrogenase inhibi¬ 
tors, 717 

Pyruvate kinase, 764 

Qinghaosu (artemisinin), 886 
Q-jumping MD, 298 
QSAR, See Quantitative struc¬ 
ture-activity relationships 
QSAR and Modeling Society, 360 
QSDock, 295 
QSiAR, 53, 60 

QSPR, See Quantitative struc¬ 
ture-property relationships 
Quadratic shape descriptors, 295 
Quadrupole time-of-flight hybrid 
(QqTOF) mass spectrome¬ 
try, 585,607 
QUANTA, 258 

Quantitative structure-activity 
relationships, 1-4, 49-52, 
358. See also Comparative 
quantitative structure-activ¬ 
ity relationships; 3D quanti¬ 
tative structure-activity re¬ 
lationships 

applications with interactions 
at cellular level, 37-38 
applications with interactions 
in vivo, 3 8 3 9 

applications with isolated re¬ 
ceptor interactions, 30-37 
2D, 52, 53 
data mining, 66-67 
defined, 409 


descriptor pharmacophore 
concept, 60-63 
and docking methods, 

304-305 

Free-Wilson approach, 4, 
29-30 

guiding principals for safe, 66 
and library design, 68-69 
linear models, 26-28, 51, 
61-62 

model validation, 63-66 
and molecular similarity/di¬ 
versity methods, 194 
multiple descriptors of molec¬ 
ular structure, 54-58 
nonlinear models, 28-29, 51, 
62-63 

parameters used, 11-26 
problems with Q 2 , 64-65 
receptor theory development, 
4-7 

standard table, 51 
in structure-based design, 419 
substituent constants for, 
19-23 

taxonomy of approaches, 
52-54 

tools and techniques of, 7-11 
training and test set selection, 

65- 66 

variable selection. 60-63 
as virtual screening tool, 

66- 69 

Quantitative structure-property 
relationships, 53 
and molecular similarity/di¬ 
versity methods, 194 
Quantum chemical indices, 11, 
14-15, 54 

Quantum mechanics, 100-103 
Quercetin, 865 
Query features, 381 
logical operators, 406 
Query structures, 368 
mapping, 380 
Quinacrine, 889,890-891 
Quinine, 888-891 
Quinolines, 889-890 
Quinupristin, 876-877 
Quisqualic acid, 694 
QXP 

Monte Carlo minimization, 
298 

Rabbits 

narcosis induction by ROH, 27 


Racemates, 782 
types of, 799-801 
Racemization, 783-784 
Radiation damage 

in electron cryomicroscopy, 
612-613,614-615,616 
Raffinate, 791 
Ramachandran plot, 92 
and conformational mimicry, 
141 

Ramipril, 746 

Ramiprilat, 746-747 
Random searching 
in virtual screening, 263 
Rapamycin, 848 
binding to FKBP, 552,554 
Rapid, reversible enzyme inhibi¬ 
tors, 720,728-734 
Rapid sequence screening, 334 
Rare gas interactions, 174 
Ras-farnesyltransferaseinhibi- 
tors 

non-peptide peptidomimetics, 
665-667,668,669 
template mimetics, 643,645 
Rats 

ataxia induction by ROH, 29 
liver DHFR, QSAR inhibition 
studies, 34 
REACCS, 398 

reaction searching using, 383 
Reactant-biased, product-based 
(RBPB) algorithm, 215,216, 
219 

Reacting centers, 366,383, 398, 
409 

Reaction Browser/Web, 387 
Reaction databases, 386 
Reaction field theory, 83 
Reaction indexing, 383 
Reaction Package, 386 
Reactions, See Chemical reac¬ 
tions 

Reaction scheme, 409 
Reagent Selector, 387,391-392 
RECAP (Retro synthetic Combi¬ 
natorial Analysis Proce¬ 
dure), 249 

Receptor-based design 
NMR spectroscopy for, 510, 
532-562 

pharmacophore generation, 
259 

Receptor-based 3D QSAR, 304 
Receptor-ligand complexes, 78 
Receptor-ligand mimetics, 636 
Receptor mapping, 148-149 
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Receptor-relevant subspace, 204, 
222 

Receptor theory, 4-7 
Reciprocal nearest neighbor, 220 
Recursive partitioning, 247-248 
Red clover extract 
LC-MS mass spectrum, 589, 
590 

Reduction 

enzyme-mediated asymmetric, 
806 

Refining, search queries, 409 
REFMAC, 478 

Registration, of chemical infor¬ 
mation, 377-379 
Registry number, 378-379,409 
Relational databases, 363, 373, 
409 

Relative diversity/similarity, 209 
Relaxation parameters, in NMR, 
511,512 

changes on binding, 536-537 
and ligand dynamics, 528-531 
and NMR screening, 571-573 
in receptor-based design, 534 
Relenza 

structure-based design, 451 
Relibase, 315 
Reminyl, 892 
Renin inhibitors, 432 
molecular modeling, 123,153 
transition state analogs, 647 
REOS filtering tool, 225 
RESEARCH 

Monte Carlo simulated an¬ 
nealing, 297 
Resiniferatoxin, 854 
Restrained electrostatic poten¬ 
tial, 102 

Result set, 409,411 
Retigotine, 783 
Retinoic acid 

docking and homology model¬ 
ing, 305 

stereoisomer analogs, 707 
X-ray crystallographic studies, 
492-493 

Retinoid X receptor 
X-ray crystallographic studies, 
493 

Retrosynthetic analysis, 409 
Retrothiorphan, 650, 651 
Reverse nuclear Overhauser ef¬ 
fects pumping, 573 
Reversible enzyme inhibitors, 
720 


RGD peptide sequence mimics, 
129,643,645,662-665 
Rgroups, 368,373,397,405, 
409-410 

and combinatorial library de¬ 
sign, 221 
Rhinovimses 

comparative molecular field 
analysis, 153 

molecular modeling of antivi¬ 
ral binding to HRV-14,120, 
122 

target of structure-based drug 
design, 454-456 
Rhodopeptin 

template mimetics, 644,645 
Ribbed melilot, drugs derived 
from, 882 

Rifamycin, 870,872 
Rigid analogs, 694-699 
Rigid body rotations, in molecu¬ 
lar modeling, 90-91 
Rigid docking, 262-263,293 
Rigid geometry approximation, 
in molecular modeling, 89 
Ring-position isomer analogs, 
699-704 
Rings 

in druglikeness screening, 245 
molecular comparisons, 139 
in molecular modeling, 91 
Ring-size change analogs, 
699-704 
Ritalin 

classical resolution, 793-794 
nonclassical resolution, 801 
Ritonavir, 648,659 
asymmetric synthesis, 
807-808,809 

structure-based design, 438, 
440 

Rivastigmine, 774 
structure-based design, 
449-450 
RNA 

molecular modeling, 154 
NMR structural determina¬ 
tion, 535 

RNA polymerase inhibitors, 717 
Ro-31-8959,121 
Ro-32-7315,652, 653 
Ro-46-2005,673,676 
ROCS, 256,259 
shape-based superposition, 
260 

Roll-up, 410 

Root structure, 368,404,410 


ROSDAL notation, 368,410 
Rosuvastatin, 848, 880-881 
Rosy periwinkle, vinca alkaloids 
from, 858 
Rotatable bonds 
in druglikeness screening, 245 
in molecular modeling, 90-91 
Royal Society of Chemistry 
Chemical Information 
Group, 360 
RPR109353, 211 
R,S descriptors,for chiral mole¬ 
cules, 365, 783 

RS 3 Discovery System, 377,385 
RSR-13,422,423 
RSR-56,422,423 
RTECS, 246 
RUBICON, 386 
virtual screening application, 
254 

"Rule of 5," See Lipinski's "rule 
of 5" 

S-37435,675 
Saccharomyces cerevisiae 
genome sequencing, 344 
Saccharopolyspora erythraea, 

874 

Salbutamol, 885,886 
Salmeterol, 885,886 
S-Salmeterol 

enzyme-mediated reduction, * 
806,808 
Salmonella 

mutagenicity prediction, 829, 
831-832,840,842-843 
Salt bridges, 285 
and virtual screening, 272 
Salts definitions, 376 
Salts search, 388 
Sampatrilat, 651 
Saquinavir, 648,659, 717 
structure-based design, 
435-437,440 

SAR-by-NMR approach, 508, 

516 

in NMR screening, 564468, 
576 

Sarin, 774 
Saturated rings 

analogs based on substitution 
cf aromatic for saturated 
ring; or the converse, 
699-704 

Saturation diversity approach, 
223 
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Saturation transfer difference 
NMR, 568-570 

SB203580 

structure-based design, 457 

SB209670, 211, 675 
SB214857,213 
SB242253 

structure-based design, 458 
Scaled particle theory, 84 
SCH 47307,667 
SCH 57939,808,810 
SCH 66701,667 
Schrodinger equation, 79,363 
Scientific and Technical Infor¬ 
mation Network, 597 
SciFinder, 385 
SCOP, 353 

X-ray crystallography applica¬ 
tion, 494 
ScoreDock 
assessment, 303 

Scoring functions, 261,264-266, 
306-312,322 
assessment, 312-315 
basic concepts, 289-290 
and molecular modeling, 
115-116 
overview of, 307 
penalty terms, 313 
Screening. See also Combinato¬ 
rial chemistry; High- 
throughput screening; Vir¬ 
tual screening 
mass spectrometry-based, 
597-598 

solid phase mass spectromet- 
ric, 606-607 
ScreenScore, 319 
Sculpt, 387 
SEAL, 316,321 
Search queries, 368 
/3-Secretase inhibitors 
transition state analogs, 649 
Selector, 387 

SELECT program, 218-219,221 
SELECT statement, 404,406 
Self-Organizing Map method, 

65, 66 

Semirigid analogs, 694-699 
Sequence assembly, 342 
Sequence comparison, 334 
bioinformatics for, 346-347, 
352-353 

Sequence databases, 387 
Sequences, 363-364 
Sequential docking, 317 
Sequential simplex strategy, 11 


Serevent, 806 

Serial analysis of gene expres¬ 
sion (SAGE), 344 
Serine 

chemical modification re¬ 
agents, 755 

Serine peptidase inhibitors 
transition state analogs, 

652-655 

Serine protease inhibitors 
affinity labels, 762 
common structural motifs, 

494 

QSAR studies, 5 
Serotonin 

conformationally restricted 
analog, 696 

ring position analogs, 703-704 
Serotransferrin /3 
X-ray crystallographic studies, 
493 

Serum albumin 
binding cf enantiomers, 786 
mass-spectrometric binding 
assay screening, 604 
target of NMR screening stud¬ 
ies, 567-568,573 
SFCHECK program, 478 
Sgroups, 373,397,405,410 
Shake and Bake, 477,478 
SHAPES 

NMR screening libraries, 575 
and SAR-by-NMR, 568 
SHARP, 478 
SHELX, 478 
Sialic acid, 450-451 
Sialidase 

genetic algorithm study of 
docking, 88-89 
Sickle-cell anemia, 419-425 
Side chains 

of known drugs, and druglike- 
ness screening, 248-249 
peptide-like, 635, 642 
Signature 

molecular similarity methods, 
188 

Similarity searching, 379, 

382-383, 410. See also Mo¬ 
lecular similarity/diversity 

methods 

in molecular modeling, 

135-138 

and QSAR, 67-68 
SQL for, 395 


Simulated annealing. See also 
Monte Carlo simulated an¬ 
nealing 

and combinatorial library de¬ 
sign, 217 

with FOCUS-2D method, 68 
hydrogen bonds, 107 
in molecular similarity/diver¬ 
sity methods, 205 
with QSAR, 53, 61 
in virtual screening, 263 
Simulated moving bed chroma¬ 
tography 

for enantiomer separation, 
787,789-793,821 
Simvastatin, 719, 744,879,880 
Single nucleotide polymorphism 
(SNP)maps, 338-340 
Single-wavelength anomalous 
diffraction phasing (SAD), 
477-478 

Sirolimus, 848,849 
Site-based pharmacophores, 
235-237 

Size-exclusion chromatography, 
599 

Sizofilan, 849 
SKF 107260,663 
SLIDE 

anchor and grow algorithm, 
296 

combinatorial docking, 317 
explicit water molecules, 302 
geometric/combinatorial 
search, 295 
ligand handling, 293 
protein flexibility, 301 
receptor representation in, 

291 

SLN (Sybyl Line Notation), 369, 
410 

Slow-binding enzyme inhibitors, 
720,734-740,749 
Slow-tight-binding enzyme in¬ 
hibitors, 720, 734-740 
SMART, 349 

functional group filters, 246 
SMILES notation, 254,410 
and canonical renumbering, 
378 

described, 368-369,371 
use with comparative QSAR, 
39 

SmoG, 311 

SN-6999,544 

Snowdrops, drugs derived from, 
892 
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SNX-111,851-852 
SOCRATES, 361 
Sodium cromoglycate, 883-884 
Soergel distance, 68 
Solid Phase Synthesis database, 
385 

Solution molecular dynamics, 
528 

Solvation effects 
and docking scoring functions, 
307,308,310 
drug-receptor complexes, 
177-179 

molecular modeling, 83-85 
SOLVE, 478 
Somatostatin 

conformationally restricted 
peptidomimetics, 129,637, 
638 

receptor agonists found 
through combinatorial 
chemistry, 657 
template mimetics, 643-644, 
645 

Sorangium cellulosum, epothi- 

lones from, 864 
SPC model, 175 
Specific structure, 368,403 
Sphere coloring, 296 
Sphere-exclusion, 207 
Spindle poisons, 867 
Spin-label NMR screening, 
573-574 

SPLICE, 89,113 
Spongothymidine, 867-868 
Spongouridine, 867-868 
SPRESI, 254 
SPRESF95, 385 
SQL (Structured Query Lan¬ 
guage), 395,410 
SR-48968,670 
SR-120107A, 670,673 
SRS (Sequence Retrieval Sys¬ 
tem), 335 
Standardization 
bioinformatics, 337 
Star schema, 390,391,410 
Statins, 719,848 
multisubstrate analogs, 
744-746 

Statistical mechanics, 94-95 
Stem cell factor 
X-ray crystallographic studies, 
493 

Stereoisomer analogs, 704-707 
Stereoisomers, 365-366, 

783-785 


Stereoplex, 387 
Stereoselective synthesis. See 
Asymmetric synthesis 
Steric parameters 
in QSAR, 23-25, 52 
STERIMOL parameters, 24, 50 
Steroid 5a-reductase inhibitors, 
717,768-770 
QSAR studies, 37-38 
Steroids 

affinity for binding proteins, 
147 

biosynthesis inhibition, 770 
STN Express, 385 
STN International, 385 
STO-3G basis set, 175 
Storage, cf chemical informa¬ 
tion, 373-377 
Streptavidin 

free energies of binding, 286 
genetic algorithm study of bi¬ 
otin docking to, 89 
interaction with biotin, 
181-183 

Streptogramins, 876-877 
Streptomyces, 876,891 
Streptomyces cattleya, 872 
Streptomyces clavuligerus, 869 
Streptomyces erythreus, 874 
Streptomyces griseus, 869 
Streptomyces Venezuela e, 870 
Streptomycin, 869-870 
Stromelysin 

flexible docking studies, 265 
NMR binding studies, 

555-557 

target of NMR screening stud¬ 
ies using SAR-by-NMR, 566 
target cf structure-based drug 
design, 443-444 
Structural data mining, 410 
Structural frameworks of known 
drugs 

and druglikeness screening, 
248-249 

Structural genomics, 283 
and bioinformatics, 352354 
and X-ray crystallography, 
481,494-496 

Structural homology, See Ho¬ 
mology 

Structural similarity, 255 
Strueture-activity relationships. 
See also Quantitative struc¬ 
ture-activity relationships 
and data mining, 66-67 
and molecular modeling, 134 


nonlinear, 62 

pharmacophore searching for 
generating, 255,272-273 
and toxicity prediction, 
828-843 

Structure-based drug design, 
358,417-419,467-469 
antifolate targets, 425-432 
and combinatorial chemistry, 
227 

combinatorial library design, 
225-228 

and docking studies, 282, 
321-322 

hemoglobin, 419-425 
hydrolases, 449-454 
iterative cycles, 282,463 
NMR spectroscopy for, 419, 

516- 517 

oxidoreductases, 445-449 
phosphoryl transferases, 
456-461 

picomaviruses, 454-456 
proteases, 432-445 
and virtual screening, 244 
Structure-based inhibitor de¬ 
sign, 418 

Structure-based virtual screen¬ 
ing, 260-267 
Structure elucidation 
NMR spectroscopy for, 

517- 525 

Structure table, 376 
Structure verification 
as bottleneck in drug discov¬ 
ery, 592 

mass spectrometry applica¬ 
tion, 594-596 

Subgraph isomorphism, 67,405, 
410 

Subreum, 849 
Substance P antagonists, 
669-671 

Substances, 368,410 
Substituent constants, for 
QSAR, 19-23 

Substrate analog enzyme inhibi¬ 
tors, 733 

Substructure searching, 255, 
379,381-382,410-411 
and QSAR, 67 
SQL for, 395 

Substructure search keys, 375, 
376,378,410 

molecular similarity/diversity 

methods, 189,221 
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Subtilases 

homology modeling, 123 
Succinate dehydrogenase, 733 
Succinic semialdehyde dehydro¬ 
genase inhibitors, 718 
Succinyldicholine 
conformationally restricted 
analogs, 699 
Sugars 
chirality, 784 

Suicide substrate MMP inhibi¬ 
tors, 651-652 
Suicide substrates, 756 
Sulbactam, 718 
Sulfonamides 
pharmacophore points, 249 
Sulfones 

pharmacophore points, 249 
Sulfonyl halides 
filtering from virtual screens, 
246 

Sulphonamides, 717 
Supercritical fluid chromatogra¬ 
phy 

for enantiomer separation, 

787 

Supercritical fluid chromatogra- 
phy-mass spectrometry 

(SFC-MS) 

for combinatorial library puri¬ 
fication, 594 
Superstar, 315 

Superstructure search, 255,257, 
411 

Supervised data mining, 66-67, 
411 

Suxamethonium, 857 
Sweet clover, drugs derived 
from, 882 

Sweet wormwood, drugs derived 
from, 886 

SWISS-PROT, 335,345-346 

SYBYL, 130 

Sybyl Programming Language, 
378,410 

Synercid, 848,849, 876 
SYNLIB, 361 
SYSDOC 

ligand handling, 293 
Systematic search 
and Active Analog Approach, 
144-145 

and conformational analysis, 
89-93 

in docking methods, 292 
in molecular modeling, 89-94, 

116 


T. gondii DHFR, QSAR inhibi¬ 
tion studies, 33 
Tabular storage, 369-371 
Tabu search 

with docking methods, 292, 
299 

in virtual screening, 263 
Tachykinin receptors, 669 
Tacrine, 58 

structure-based design, 449 
Tacrolimus, 848,849 
Tadpoles 

narcotic action of ROH, 28-29 
Tagging approaches, 596-597 
TAK-029,213 
TAK-147 

structure-based design, 450 
Tandem mass spectrometry 
(MS-MS),590-591 
of combinatorial libraries, 592 
for structure determination of 
bioactive peptides, 518 
types of mass spectrometers, 
585 

Tanimoto coefficient, 68,202, 
411 

cluster-based methods with, 
206 

and similarity searching, 382, 
410 

for virtual screening, 210 
Tanimoto Dissimilarity, 220 
Tanomastat 

structure-based design, 
444-445,446 
TargetBASE, 348 
Target class approach, 188, 
228-234 

Target discovery. See also Drug 
targets 

bioinformatics for, 335, 
338-345 

TAR RNA inhibitors, 103 
Tautomenzation 
NMR spectroscopy, 526-528 
Tautomers, 366 

Tautomer search, 388,405-406 
Taxol, 843,848,861-863 
HMBC spectroscopy, 518 
NMR spectroscopy, 525-526, 
531 

Taxol side-chain, 803-804 
Taxus baccata (English yew), 
861-862 

Taxus brevifolia (Pacific yew), 
861-862 


TB36 

structure-based design, 
424-425 

TBC 3214,674,676 
Team Works, 377 
Teicoplanin, 849 
T ehthromy cin, 848,876 
Temperature 

molecular dynamic simula¬ 
tion, 96 

Template mimetics (peptidomi- 

metics), 643-644 
Tendamistat 

NMR relaxation measure¬ 
ments, 528-529,535 
Teniposide, 867 
Teprotide, 746,881 
Terabyte, 411 
Terbinafine, 717 
Testosterone, 36, 768, 771 
A,-T etrahy drocannabinol 
(THC), 852-853 
Tetrahydrofolate, 425 
Tetrahymena pyriformis 
growth inhibition, 27, 37-38 
spiro-Tetraoxacycloalkanes 
ring-size analogs, 702-703 
Tetrazoles, 135 
as surrogates for cis-amide 
bond, 141-142 
Thalidomide, 783-784,785 
Thebaine, 850,851 
Theilheimer/Chiras/Metalysi 
database, 386 

Therapeutic area screening 
molecular similarity/diversity 
methods, 191 

Thermodynamic cycle integra¬ 
tion, 99-100,120-121 
Thermoly sin inhibitors 
genetic algorithm study of ac¬ 
tive site, 89 

molecular modeling, 117, 120, 
121,151-153 

novel lead identification, 321 
transition-state analogs, 
749-750 

Thick clients, 400-401,411 
Thienamy cin, 872,874 
Thin clients, 363,392,401,411 
Thiobiotin 

binding to avidin, 181, 182 
Thioesters 

filtering from virtual screens, 
246 

j3-ThioGARdideazafolate (j3-TG- 

DDF), 742-743 
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Thiol proteases 
QSAR studies, 5 
Thiomuscimol, 690 
4-Thioquinone fluoromethide, 
770 

Thioridazine, 805,806 
Thiorphan, 650,651 
Thor database manager, 386 
Thor system, 377 
exact match searching, 
380381 

Threading, 123-125 
3D descriptors 

molecular similarity/diversity 
methods, 55-58, 191-201 
validation, 211-213 
Three-dimensional electron 
cryomicroscopy, 615-616 
3D models, 363,366-367, 
397-398 

3D pharmacophores 
filter cascade, 267 
for molecular simUarity/diver- 
sity methods, 194-201 
for searching, 381-382 
similarity searching, 189,383 
for virtual screening, 210, 
255-259 

3D quantitative structure-activ¬ 
ity relationships (3D- 
QSAR),52, 53, 58-60 
and molecular modeling, 115, 
138 

3D query features, 368,381382, 
398 

3DSEARCH, 111,259 
3D structure databases, 387 
3-Point pharmacophores, 376, 
408 

molecular similarity methods, 
189, 195-196, 198 

for virtual screening, 210 
Threo- prefix, 784 
Threose 

enantiomers, 784 
Thrombin inhibitors, 227 
combinatorial docking, 318 
force field-based scoring study, 
307 

molecular modeling, 116 
non-peptide peptidomimetics, 
660-662,663,664 
seeding experiments, 319 
site-based pharmacophores, 
235-236 

target of structure-based drug 
design, 442-443 


Thromboxane 4,762-763 
Thymidine kinase inhibitors, 

717 

role of water in docking, 303 
X-ray crystallographic studies, 
493 

Thymidylate synthase inhibi¬ 
tors, 227, 717 

target of structure-based drug 
design, 425,426-429 
Thymitaq, 428 
Thyroid hormones 
NMR spectroscopy, 529-531 
Thyroid receptor beta, 263 
Thyroliberin 
peptidomimetics, 129 
Thyrotropin-releasinghormone, 
637 

Thyroxine 

NMR spectroscopy, 529-531 
Tight-binding enzyme inhibi¬ 
tors, 720, 734-740, 749 
Time-of-flightmass spectrome¬ 
try, 585,607 
Timolol 

renal clearance, 38 
TIP3P model, 175 
Tipranavir, 812,813 
Tirilazad mesylate, 849 
Titrations 

NMR application, 545 
TNF-a converting enzyme 
(TACE), 652 
Tolamolol 
renal clearance, 38 
Tolrestat 

structure-based design, 448 
Tomudex, 427 
Toolkits, 386,411 
Toothpick plant, drugs derived 
from, 883 
TOPAS, 192 
TOPKAT, 246 
Topographical data, 411 
Topographical mimetics (pep- 
tidomimetic s), 636 
Topoisomerasell inhibitors, 717 
Topological descriptors 
for druglikeness screening, 
247-249 

estimation systems, 388-389 
with QSAR, 54-55 
Topotecan, 848,849,861 
Torsional potential, 80 
Toxicity databases, 246,386 
development, 828-829 
Toxicity prediction, 827-843 


Toxicity screening 
as bottleneck in drug discov¬ 
ery, 592 

and functional group filters, 
246-247 

pulsed ultrafiltration applica¬ 
tion, 605 

Toxicophores, 829-831 
associated with allergic con¬ 
tact dermatitis, 830 
C-Toxiferine 1,8 5 6,8 5 7 
TPCK, 760-761,762 
Tramadol, 782 
chromatographic separation, 
792 

classical resolution, 795-796 
metabolism, 786-787 
Transesterification 
enzyme-mediated asymmetric, 
805-806 

Transferred NOE technique, 

532 

and NMR screening, 572-573 
Transition-state analog enzyme 
inhibitors, 720, 748-754 
Transition state analog inhibi¬ 
tors, 646 

peptide bond isosteres, 644 
7-Transmembrane G-protein- 
coupled receptors, 229-234 
Transpeptidase inhibitors, 717 
Transverse relaxation-optimized 
spectroscopy (TROSY), 515 
for macromolecular structure 
determination, 533,534 
Trees, 376-377, 411 
TrEMBL, 335,346 
Triazines 

QSAR studies cf cellular 
growth inhibition, 37-38 
QSAR studies cf DHFR inhibi¬ 
tion, 31-33 

Trimethoprim, 717, 719 
interaction with dihydrofolate 
reductase, 151,183 
structure-based design, 425 

a,(o -6is -Trimethylammonium 

polymethylene compounds, 
710 

Trimetrexate 

interaction with dihydrofolate 
reductase, NMR spectros¬ 
copy, 531,557-559 
Triple resonance spectra, 514 
Tripos, Inc. databases, 387 
Tripos force field, 80 
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t-RNA guanine transglycosylase 
inhibitors 

novel lead identification, 321 
Trojan horse inactivators, 756 
Trypsin inhibitors 

molecular modeling, 120 
QSAR studies, 5, 25 
site-based pharmacophores, 
235-236 

Trypsinogen inhibitors 
molecular modeling, 116 
Tryptophan 

chemical modification re¬ 
agents, 755 
TSCA database, 386 
Tubby gene 

X-ray crystallographic func¬ 
tion elucidation, 494 
Tube curare, 856 
D-Tubocurarine 

drugs derived from, 856,857 
fragment analogs, 708-710 
/3-Tubulin 

X-ray crystallographic studies, 
483 

Tumor necrosis factor receptor 1 
X-ray crystallographic studies, 
493 

2D descriptors 

molecular similarityldiversity 
methods, 191-194 
with QSAR, 54-55 
validation, 211-213 
2D pharmacophore searching, 
383 

filter cascade, 267 
virtual screening, 255 
2D quantitative structure-activ¬ 
ity relationships (2D- 
QSAR), 52, 53 
2D query features, 397 
2D structures, 364-366, 397 
conversion of names to, 373 
2-Point pharmacophores, 376 
Tyrosine 

chemical modification re¬ 
agents, 755 

Tyrosine kinase inhibitors 
molecular modeling, 130 

U-85548 

structure-based design, 
436-437,438 

Ugi reaction, 229,231, 232, 236 
UK QSAR and Cheminformatics 
Group, 360 
Ukrain, 849 


Uncompetitive inhibitors, 
729-730 
Unicode, 411 
UNITY, 259,377 
descriptors, 192,201 
in molecular modeling, 111 
novel lead identification, 320 
UNITY 2D, 212 
UNITY 3D, 363,387 
University of Manchester Bioin¬ 
formatics Education and 
Research site (UMBER), 
335 

Unix, 396, 411 
Unsupervised data mining, 
66-67,412 
Urea 

pharmacophore points, 249 
Ureido resonance, 182 
USEPA Suite, 390 

VALIDATE, 116,310 
Vancomycin, 770 
Vancomycin-peptide complex 
binding affinity, 119 
van der Waals forces, 174,285 
and docking scoring, 308 
enzyme inhibitors, 723-724 
and molar refraction, 24 
molecular modeling, 79-80, 
81,82,89 
and QSAR, 6,7 
van der Waals radius, 79, 81, 
173 
Vanillin 

antisickling agent, 419-420 
Vanilloid receptors, 853-854 
VanX inhibitors, 770-771, 772 
VARCHAR data type, 412 
VARCHAR2 data type, 412 
Vector maps, 140-142 
Verapamil 

classical resolution, 798 
Verapamilic acid, 798 
Vidarabine, 717 
Vigabatrin, 718,766, 767,782 
Vinblastine, 858-859,860 
Vinca alkaloids, 858-860 
Vincristine, 858-859,860 
Vindesine, 859-860 
Vinorelbine, 849 
Viracept, 659 

structure-based design, 440, 
442 

Viral DNA polymerase inhibi¬ 
tors, 717 

Virtual chemistry space, 67 


Virtual libraries, 237, 283,315 
handling large, 220-221 
and QSAR, 61 
Virtual rings, 91 
Virtual screening, 244-245, 
271-274,315-317,412.See 
also Docking methods; Scor¬ 
ing functions 
applications, 267-271 
basic concepts, 289-290 
combinatorial docking, 
317-318 

consensus scoring, 265-266, 
291,319-320 

docking as virtual screening 
tool, 266-267 
druglikeness screening, 
245-250 

filter cascade, 267 
focused screening libraries for 
lead identification, 250-252 
hydrogen bonding and hydro- 
phobic interactions, 319 
ligand-based, 188,209-214 
molecular similarityldiversity 
methods for, 188,190, 
209-214 

novel lead identification, 
320-321 

pharmacophore screening, 
252-260 

QSAR as tool for, 66-69 
seeding experiments, 318-319 
structure-based, 260-267 
weak inhibitors, 319 
Vista search program, 387 
Vitamin D receptor 
X-ray crystallographic studies, 
493 
VK19911 

structure-based design, 458 
Voglibose, 849 
VolSurf program, 202 
Volume 

molecular dynamic simula¬ 
tion, 96 

Volume mapping, 139-140 
Voronoi QSAR technique, 53 
VRML (Virtual Reality Markup 
Language), 405 
VX-497 

structure-based design, 447 

VX-745 

structure-based design, 458 

Warfarin, 882-883 
enantiomers, 786 
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Warfarin (Continued) 

HIV protease inhibitor, 659, 

661 

nonclassical resolution, 801 
wARP, 478 
Water 

gas phase association thermo¬ 
dynamic functions, 178 
importance cf bound in struc¬ 
ture-based design, 409 
molecular modeling, 85 
octanol/water partitioning sys¬ 
tem, 16-17 

and protein-ligand interac¬ 
tions, 288 

role in docking, 302-303, 
313-314 

solvating effect in enzyme in¬ 
hibitors, 722-723 
Wellcome Registry, 222 
White-Bovill force field, 80 
WIN-35065-2 

dopamine transporter inhibi¬ 
tor, 268 
WIN-51711 

structure-based design, 
454-455 
WIN-54954 

structure-based design, 455 


WIN-63843 

structure-based design, 
455-456 

Wiswesser line notation, 
368-369 

WIZARD, 255,260 
World Drug Index (WDI), 379, 
386,387 

World Patents Index (WPI), 386 

Xanthine-guanine phosphoribo- 
syltransferase 

X-ray crystallographic studies, 
493 

Xanthine oxidase inhibitors, 718 
XFIT graphics program, 478 
Ximelagatran 

structure-based design, 442 
XML (extensible Markup Lan¬ 
guage), 371,405,412 
XMLQuery, 412 
X-ray crystallography, 351, 
471-473,612 
applications, 479-481 
crystallization for, 473-474, 
480-481 

databases for, 478-479 
data collection, 474-476 
drug targets with published 
structures, 482-493 


and molecular modeling, 78 
phase problem, 476-478 
and QSAR, 5 

and structural genomics, 481, 
494_496 

in structure-based drug de¬ 
sign, 418,419,420 
and structure-based library 
design, 225 

and virtual screening, 244 
X-ray diffraction, 472-473,614 
X-ray lenses, 612 

Yellow sweet clover, drugs de¬ 
rived from, 882 
Yew tree, paclitaxel from, 
861-862 
YM-022,856 

Yukawa-Tsuno equation, 14 

Z-100,849 
Zanamivir, 717 
structure-based design, 451 
Ziconotide 

NMR spectroscopy, 518-523, 
526,534 
Zidovudine, 717 
Zingerone 

allergenicity prediction, 835 
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